`isset` does two things: it checks if the entity resulting from the chain of accessors is defined, and if it is not `null`.
We know that the properties are always defined so we can just replace those `isset`s with `null` checks.
This will allow us to reason about the code more clearly.
We use `DOMDocument::registerNodeClass()` to make DOM methods return
`JSLikeHTMLElement` instead of `DOMElement`. Unfortunately, it is not
possible for PHPStan to detect that so we need to cast it ourselves:
https://github.com/phpstan/phpstan/discussions/10748
We may want to deprecate it in the future just to get rid of this mess.
Also add PHPStan stubs for DOM classes so that we do not need to cast everything.
It is fine to do that globally as we only ever use DOM with `JSLikeHTMLElement` registered.
This patch also allows us to get rid of the assertions in tests.
Since we require PHP 7.4, contravariance in param types is supported,
so we do not need to worry about subclasses that widen the param type.
It will be only breaking in the unlikely case a subclass uses a type that
contradicts the PHPDoc type annotation and does not not extend `DOMNode`.
Also fix the type annotation since some invocations pass it a `DOMText`,
an arbitrary sibling/child `DOMNode` or even `null`.
`DOMAttr::$value` must be a `string`.
Let’s add helpers for manipulating the `readability` attribute
so that we do not have to keep casting it from and to `string`
in order to appease `strict_types`.
`DOMNodeList` implements `Traversable`.
There are some `for` loops left but we cannot simply replace those:
PHP follows the DOM specification, which requires that `NodeList`
objects in the DOM are live. As a result, any operation that removes
a node list member node from its parent (such as `removeChild`,
`replaceChild` or `appendChild`) will cause the next node
in the iterator to be skipped.
We could work around that by converting those node lists to static arrays
using `iterator_to_array` but not sure if it is worth it.
It would fail for e.g. `<div> <p>foo</p> </div>`.
mozilla/readability uses children for the tag lookup, which return only elements.
PHP does not have children property so b580cf216d
mistakenly used `childNodes` instead, but that can return any node type.
Let’s filter the children ourselves.
Also add comments from mozilla/readability’s `_hasSingleTagInsideElement`.
Once we bump minimum PHP version, we will get newer PHP-CS-Fixer,
which will try to apply this cleanups.
Also manually tweak anonymous functions so that they are cleanly formatted
once we switch to `fn` syntax.
This is deprecated since PHP 8.2:
Deprecated: mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead
It was used because `DOMDocument`, which uses libxml2 internally, will parse the HTML as ISO-8859-1, unless the document contains an XML encoding declaration or HTML meta tag setting character set.
Since first such element wins, putting the `meta[charset]` up front will ensure the parser uses the correct encoding, even if the document contains incorrect meta tag (e.g. when the document is converted to UTF-8 without also updating the metadata by the software passing it to Readability).
https://stackoverflow.com/a/39148511/160386
1) src/Readability.php (braces, no_unneeded_control_parentheses, single_line_comment_spacing, global_namespace_import, no_unused_imports, phpdoc_align)
2) src/JSLikeHTMLElement.php (phpdoc_separation)
Switch code blocks to Markdown syntax to work around `phpdoc_separation`, ApiGen uses Markdown these days anyway.