readability: stop tidy from wrapping noscript text

HTML 4.01 Strict only allows block-level elements within noscript, form and
blockquote. The `enclose-block-text` option fixes the instances when those
elements contain inline elements or text by wrapping the children in paragraphs.

HTML 5 has looser content model and allows noscript elements basically anywhere,
including paragraphs, making the noscript elements inherit the parent element’s
content model. This means that tidy will produce invalid HTML nesting paragraphs
for `p > noscript > text`, a structure that would be invalid on two counts
in HTML 4 Strict profile but is completely valid in HTML 5.

Popular WordPress image lazy-loading code produces precisely that structure
so tidy “corrects” it to invalid code. In a proper HTML parser, the produced
code would force close the outer paragraph, making the noscript element
its sibling instead of a child. The only reason this does not break Graby’s code
for stripping the lazy-loading HTML is that libxml2 contains a bug
counteracting this:

https://gitlab.gnome.org/GNOME/libxml2/-/issues/205

Since all three elements allow flow content in HTML 5, it does not make much
sense to enable this option any more. The only possible issues that could occur
is producing HTML code not conforming to 4.01 Strict but that was never guaranteed,
as our example shows, and having blockquotes contain text nodes not wrapped
in paragraphs, which might be expected by some ancient stylesheets
but that is only minor and easily fixable visual backwards incompatibility.
pull/60/head
Jan Tojnar 5 years ago committed by GitHub
parent c6425cc28b
commit 7cea79c23a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 1
      src/Readability.php

@ -116,7 +116,6 @@ class Readability implements LoggerAwareInterface
'drop-empty-paras' => true,
'drop-proprietary-attributes' => false,
'enclose-text' => true,
'enclose-block-text' => true,
'merge-divs' => true,
// 'merge-spans' => true,
'input-encoding' => '????',

Loading…
Cancel
Save