WIP: 19 remove similar
This MR addresses #19
Added dependencies
- This MR brings in imagehash as a dependency.
TODO:
-
Validate the approach against it's intended use case. ("similar" is very nuanced when working with ML image similarity measures). -
Add on-disk hash cache. Depending on use cases keeping all state in memory might be a non-starter and so storing cached hashes (or image vectors/embeddings) on disk might be better option. -
test performance (memory and speed) on large datasets to validate usability.