Skip to content

use diff_match_patch to detect if anonymization resulted in important changes

Dan Crosta requested to merge better-diff-detection into master

"important" changes are changes to the content, but ignoring (most) whitespace. we currently flag whitespace changes that definitely impact markdown, for the sorts of markdown we're likely to get: newlines (== paragraphs), and whitespace after newlines (which might affect list formatting).

this is starting to feel like guessing what matters to markdown; if we want to go any further than this, another idea I had was to detect whether the changes affect the tree of HTML elements (render markdown -> html; parse html into a tree of some sort; then compare the structure of the tree before/after). shout if you think that's a good idea.

Edited by Dan Crosta

Merge request reports