Readability was previously removing (was trying to actually, see next section) invisible nodes using a pattern from `unlikelyCandidates`. This was quite hacky and was removed during a backport of logics from mozilla/readability. There is still a need to remove them so here we are. We still use a pattern but specifically against the style attribute. We also remove nodes with the attribute `hidden`. The clean feature of tidy actually replaces inline style attributes with css classes thus preventing readability to detect invisible nodes, see https://github.com/htacg/tidy-html5/blob/5.6.0/src/clean.c#L1488 We therefore set clean configuration to false. Signed-off-by: Kevin Decherf <kevin@kdecherf.com>pull/64/head
parent
b580cf216d
commit
a44c4e5482
2 changed files with 66 additions and 1 deletions
Loading…
Reference in new issue