Add Levenshtein ratio to find_common_lines (!443) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

Andras Herczeg requested to merge 308-jaccard-index-similarity into main Oct 31, 2023

What does this merge request do and why?

Add a Levenshtein ratio to the find_common_lines functions, so that we can also group very similar lines, not only exact matches. The Levenshtein algorithm was chosen because it seemed to perform better than the Jaccard index, and faster than the built-in difflib.SequenceMatcher.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Closes #308 (closed)

Edited Nov 01, 2023 by Andras Herczeg

Add Levenshtein ratio to find_common_lines

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports