tidy: use tidy_repair_string instead of tidy_parse_string+tidy_clean_repair

A change released in tidy 5.6.0 breaks php-tidy when using
tidy_parse_string+tidy_clean_repair and wrap=0, incorrectly wrapping
every single word. Also it seems that $tidy->value should not be used to
retrieve the repaired html as far as it is undocumented and for internal
use.

We replace the call with tidy_repair_string which directly returns the
repaired string.

Relates to https://github.com/htacg/tidy-html5/issues/673
Relates to https://bugs.php.net/bug.php?id=75947

Tests pass.

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
pull/42/head
Kevin Decherf 7 years ago committed by Jeremy Benoist
parent db69fe59a2
commit 26c881d864
No known key found for this signature in database
GPG Key ID: BCA73962457ACC3C
  1. 6
      src/Readability.php

@ -274,10 +274,10 @@ class Readability implements LoggerAwareInterface
if ($this->useTidy) {
$this->logger->debug('Tidying document');
$tidy = tidy_parse_string($this->html, $this->tidy_config, 'UTF8');
if (tidy_clean_repair($tidy)) {
$tidy = tidy_repair_string($this->html, $this->tidy_config, 'UTF8');
if (false !== $tidy && $this->html !== $tidy) {
$this->tidied = true;
$this->html = $tidy->value;
$this->html = $tidy;
$this->html = preg_replace('/[\r\n]+/is', "\n", $this->html);
}
unset($tidy);

Loading…
Cancel
Save