php-readability

Commit Graph

Author	SHA1	Message	Date
Jan Tojnar	5040fc1587	Use method param type hints instead of PHPDoc Since we require PHP 7.4, contravariance in param types is supported, so we do not need to worry about subclasses that widen the param type. It will be only breaking in the unlikely case a subclass uses a type that contradicts the PHPDoc type annotation and does not not extend `DOMNode`. Also fix the type annotation since some invocations pass it a `DOMText`, an arbitrary sibling/child `DOMNode` or even `null`.	1 year ago
Jan Tojnar	9a0a088bc6	Throw a BadMethodCallException calling get{Title,Content} uninitialized This is bad state so it should not be a breaking change.	1 year ago
Jan Tojnar	32267cb7b4	Add type annotations to properties To preserve BC, we are not using type hints for now.	1 year ago
Jan Tojnar	4f5360df90	Use helpers for content score manipulation `DOMAttr::$value` must be a `string`. Let’s add helpers for manipulating the `readability` attribute so that we do not have to keep casting it from and to `string` in order to appease `strict_types`.	1 year ago
Jan Tojnar	fae4e78845	Fix missing return value in grabArticle Not sure if this is expected but at least it works the same as before.	1 year ago
Jérémy Benoist	4258559b8a	Merge pull request #100 from jtojnar/phpunit-bridge7 composer: Allow phpunit-bridge 7.0	1 year ago
Jan Tojnar	1ac761d708	composer: Allow phpunit-bridge 7.0	1 year ago
Jérémy Benoist	d3053fbce4	Merge pull request #99 from jtojnar/phpstan2 phpstan: Upgrade to version 2	1 year ago
Jan Tojnar	4c929754e9	phpstan: Upgrade to version 2 https://github.com/phpstan/phpstan/blob/2.1.x/UPGRADING.md Required also bumping Rector since it uses PHPStan internally.	1 year ago
Jan Tojnar	1d7cdf3a12	phpstan: Use standard config path This allows developer to create their own own config file, e.g. for setting `editorUrl`: https://phpstan.org/user-guide/output-format#opening-file-in-an-editor	1 year ago
Jérémy Benoist	f825dcf55a	Merge pull request #90 from jtojnar/foreaches Iterate node lists with foreach	1 year ago
Jan Tojnar	9a9373de4b	Iterate node lists with `foreach` `DOMNodeList` implements `Traversable`. There are some `for` loops left but we cannot simply replace those: PHP follows the DOM specification, which requires that `NodeList` objects in the DOM are live. As a result, any operation that removes a node list member node from its parent (such as `removeChild`, `replaceChild` or `appendChild`) will cause the next node in the iterator to be skipped. We could work around that by converting those node lists to static arrays using `iterator_to_array` but not sure if it is worth it.	1 year ago
Jan Tojnar	d454c3a462	Remove dead iteration code This was forgotten in `b580cf216d`.	1 year ago
Jan Tojnar	8b1ef07401	Extract `for`-iterated items into variables This simplifies the code a bit and will make it slightly easier in case we decide to switch to `foreach` iteration.	1 year ago
Jan Tojnar	5885dbbe78	Remove pointless `stdClass` `DOMNode::$childNodes` always contained `DOMNodeList`.	1 year ago
Jérémy Benoist	6947999782	Merge pull request #92 from jtojnar/ci-fix ci: Fix & add PHP 8.4	1 year ago
Jan Tojnar	da755013aa	Remove extra set_error_handler callback argument It is unused and would cause an error on PHP ≥ 8.0: https://www.php.net/manual/en/function.set-error-handler.php#refsect1-function.set-error-handler-parameters Not sure if the handler is even necessary – it was introduced in `175196d6c2` but I did not manage to reproduce the original error (Entity 'nbsp' not defined). It was probably fixed by `f2a43b476c`.	1 year ago
Jan Tojnar	5b9551d1e3	ci: Add PHP 8.4 PHP 8.4 is in beta, with final version scheduled for November so it is time to start testing it.	1 year ago
Jan Tojnar	c7b10dcc45	Avoid E_STRICT constant It will be deprecated in PHP 8.4 and it is meaningless nowadays anyway: https://wiki.php.net/rfc/deprecations_php_8_4#remove_e_strict_error_level_and_deprecate_e_strict_constant The use of the constant was introduced in `175196d6c2`.	1 year ago
Jan Tojnar	80adfe870b	Fix coding style With php-cs-fixer 3.64.0, the `native_function_invocation` rule no longer passed.	1 year ago
Jérémy Benoist	cb6b6ac577	Merge pull request #88 from jtojnar/has-single-fix	2 years ago
Jan Tojnar	677f3f096e	Fix hasSingleTagInsideElement method It would fail for e.g. `<div> <p>foo</p> </div>`. mozilla/readability uses children for the tag lookup, which return only elements. PHP does not have children property so `b580cf216d` mistakenly used `childNodes` instead, but that can return any node type. Let’s filter the children ourselves. Also add comments from mozilla/readability’s `_hasSingleTagInsideElement`.	2 years ago
Jérémy Benoist	29122763db	Merge pull request #89 from jtojnar/php74 Require PHP 7.4	2 years ago
Jan Tojnar	89d3b74259	Rectorize to PHP 7.4 Switches to short anonymous function syntax.	2 years ago
Jan Tojnar	e792644fe8	Drop PHP < 7.4 support This will allow us to use flexible heredocs in test, as well as typed properties and other goodies. https://www.php.net/releases/7_3_0.php https://www.php.net/releases/7_4_0.php	2 years ago
Jan Tojnar	648d8c605b	Update coding style for upcoming PHP-CS-Fixer changes Once we bump minimum PHP version, we will get newer PHP-CS-Fixer, which will try to apply this cleanups. Also manually tweak anonymous functions so that they are cleanly formatted once we switch to `fn` syntax.	2 years ago
Jérémy Benoist	f28191a728	Merge pull request #86 from jtojnar/ci-bump ci: Update actions	2 years ago
Jan Tojnar	2103853a1b	ci: Bump coveralls to 2.7.0 - Fixes PHP 8 support https://github.com/php-coveralls/php-coveralls/releases/tag/v2.4.3	2 years ago
Jan Tojnar	7f4c6cfcbd	ci: Update actions Mostly just of nodejs bump: - https://github.com/actions/checkout/releases/tag/v4.0.0 - https://github.com/ramsey/composer-install/releases/tag/3.0.0	2 years ago
Jérémy Benoist	38870cdff1	Merge pull request #80 from jtojnar/stricter Fix some CI issues	3 years ago
Jan Tojnar	9bdd3b6b2e	ci: Add PHP 8.2 and 8.3	3 years ago
Jan Tojnar	f14428e4c0	Do not use `mb_convert_encoding` with `HTML-ENTITIES` as target encoding This is deprecated since PHP 8.2: Deprecated: mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead It was used because `DOMDocument`, which uses libxml2 internally, will parse the HTML as ISO-8859-1, unless the document contains an XML encoding declaration or HTML meta tag setting character set. Since first such element wins, putting the `meta[charset]` up front will ensure the parser uses the correct encoding, even if the document contains incorrect meta tag (e.g. when the document is converted to UTF-8 without also updating the metadata by the software passing it to Readability). https://stackoverflow.com/a/39148511/160386	3 years ago
Jan Tojnar	23f824a1ce	tests: Fix “THE ERROR HANDLER HAS CHANGED!”	3 years ago
Jan Tojnar	2a57124528	composer: upgrade rector	3 years ago
Jan Tojnar	0975574bdb	Rector: Upgrade configuration	3 years ago
Jan Tojnar	9ed89bde92	Fix PHP-Cs-Fixer changes 1) src/Readability.php (braces, no_unneeded_control_parentheses, single_line_comment_spacing, global_namespace_import, no_unused_imports, phpdoc_align) 2) src/JSLikeHTMLElement.php (phpdoc_separation) Switch code blocks to Markdown syntax to work around `phpdoc_separation`, ApiGen uses Markdown these days anyway.	3 years ago
Jan Tojnar	2c6c6d5987	PHPStan: Use stable PHPUnit path phpunit-bridge will create a symlink.	3 years ago
Jan Tojnar	c5407ec07c	composer: Add scripts for development	3 years ago
Jérémy Benoist	7cd8476d38	Merge pull request #79 from j0k3r/fix/psr-log-2-3 Allow `psr/log` 2.0 & 3.0	4 years ago
Jeremy Benoist	82083c872b	Allow `psr/log` 2.0 & 3.0	4 years ago
Kevin Decherf	6689f19956	Strip script and style tags through ::clean() method instead of preg_replace Huge tags can lead to a failure of preg_replace, thus erasing the whole fetched content. Fixes https://github.com/wallabag/wallabag/issues/5847 Signed-off-by: Kevin Decherf <kevin@kdecherf.com>	4 years ago
Jérémy Benoist	0c0653dad6	Merge pull request #73 from Kdecherf/fix/impr Fix `isPhrasingContent` conditions, text node replacement	4 years ago
Kevin Decherf	2ab87d7445	Fix isPhrasingContent conditions, text node replacement It also disables reverting forced paragraph elements as it can break layouts or corrupt content. Signed-off-by: Kevin Decherf <kevin@kdecherf.com>	4 years ago
Jérémy Benoist	8af69ad68c	Merge pull request #71 from j0k3r/feature/enable-rector Add Rector	4 years ago
Jeremy Benoist	c2a1639b34	Add Rector	4 years ago
Jérémy Benoist	ccf1b336c5	Merge pull request #64 from Kdecherf/improvements	4 years ago
Kevin Decherf	a44c4e5482	Add routine to remove invisible nodes Readability was previously removing (was trying to actually, see next section) invisible nodes using a pattern from `unlikelyCandidates`. This was quite hacky and was removed during a backport of logics from mozilla/readability. There is still a need to remove them so here we are. We still use a pattern but specifically against the style attribute. We also remove nodes with the attribute `hidden`. The clean feature of tidy actually replaces inline style attributes with css classes thus preventing readability to detect invisible nodes, see https://github.com/htacg/tidy-html5/blob/5.6.0/src/clean.c#L1488 We therefore set clean configuration to false. Signed-off-by: Kevin Decherf <kevin@kdecherf.com>	4 years ago
Kevin Decherf	b580cf216d	Backport some logics from mozilla/readability This change backports several things from mozilla/readability: - Add child score to all ancestors instead of the first parent only - Check 5 top candidates and try to find alternative candidates within ancestors, this can help to find a better parent and grab more content - Reduce patterns from `unlikelyCandidates` to the one used by Mozilla as ours tend to remove useful nodes - Score headers (h2 to h6) by default in addition to div, p, td and section Signed-off-by: Kevin Decherf <kevin@kdecherf.com>	4 years ago
Jérémy Benoist	2e9349f076	Merge pull request #69 from j0k3r/feature/php-7.2 Require PHP >= 7.2	4 years ago
Jeremy Benoist	c4bba53dbe	Remove Scrutinizer	4 years ago

1 2 3 4

185 Commits (5040fc1587cde6c8d5ba9a9ee0e2ee13a3ea1cd3) All Branches Search

185 Commits (5040fc1587cde6c8d5ba9a9ee0e2ee13a3ea1cd3)

All Branches