Fuzzy hash / text similarity helper

Right now 'closeable' use a CRC24 hash generated from lowercase text content to identify matching tooltip/reminders, text is transformed to lowercase before hashing to ignore capitalization revisions - which helps but isn't great.

Ideally I'd like a fuzzy hash function that can return a 'difference %' between between hashes without the original content - this would allow us to set clear reset thresholds for things like closeable tip text.

It would be:

  • fuzzy hash, CTPH (context triggered piecewise hashing) or similar
  • d::Library relatively small piece of code, current implementation is ~230 characters
  • d::Library reasonably performant for up to a paragraph of input text. Although it isn't meant for real-time operations, it is part of PD init to close prior closed tips.

Should:

  • make tooltip/reminder reappear only if content was significantly changed
  • generate hashes of a reasonable (120 chars? flexible) size work as storage keys.
  • allow custom diff threshold
Edited by Lorin Halpert