Use own parser to check that documents are bleached
Instead of running bleach again on a document and checking if the output is sufficiently similar, use a brand-new validator to validate the HTML.
https://github.com/remram44/bleached
I don't know how much I trust myself for security here, but the subset I accept is so small that it should be ok.