Draft: Make all search fields in OpenLP ignore diacritics/accents (!579) · Merge requests · openlp / OpenLP

Mateus Meyer Jiacomelli requested to merge mateusmeyer/openlp:ignore_diacritics_search into master Feb 14, 2023

Fixes #192 Also fixes https://forums.openlp.org/discussion/3477/ignore-accents-in-the-query/p1 https://forums.openlp.org/discussion/2457/feature-request-accent-and-punctuation-insensitive-search/p1

This was a problem we had constantly, on my current church and on another churches (and maybe on any user of OpenLP in a country with accents/diacritics language): The OpenLP was requiring the user to type exact accents to search to be successful, and some of the users were typing words without accents on search (if the user were trying to find a music, he would create accidentally a duplicate).

This MR allows the user to ignore accents/diacritics while searching anything that allows search on OpenLP. The main stumbling block is that SQLite don't support custom collations for LIKE queries (which are used by the majority of OpenLP search queries).

The MR changes the following:

Modifies the DB connection to add a custom DB function called normalize (only when the connection is SQLite);
Implements the can_ignore_diacritics() check to know whether the current OpenLP instance is allowed to apply any routine to ignore accents;
Implements the normalize_diacritics() function, which is called when the custom DB function is called. A LRU cache is applied to help in the cases where a lot of strings (like songbooks and authors) are repeated. This function tries to use PyICU and fallbacks to unicodedata if the first fails (actually PyICU transliteration is faster than unicodedata);
Creates a section called 'Search' on Setting's General Tab, to group search-related settings and to unclutter a little the UI Settings section. The option to disable the accent ignoring behavior is there, alongside the 'Search as you Type' option;
Changes all revelant DB queries to take account the result of can_ignore_diacritics() and apply the function on queries if it's enabled;
Creates a set_case_insensitive_ignore_diacritics_completer() function to apply accent ignore rules on autocomplete fields also;

There's the main open questions:

It's inevitable that this new feature will have a penalty in performance. For our library of ~500 songs, it took somewhat like 8x the original time to find the results (although the original results were in the mean of 10ms and the new results were in the mean of ~~80-100ms~~50-90ms after the last commit). Is there something we can do to better this situation (even by a little)?
~~It's enabled by default. Should we disable it by default?~~ Will be off by default
If we choose to disable it by default, how will we do to allow the user to know this function exists and enable it? Maybe a modal on upgrade telling about activating the option if the user is using some language that has accents/diacritics?
See how Duplicate Song Finder works (or don't works) with this.

Edited Feb 16, 2023 by Mateus Meyer Jiacomelli

Draft: Make all search fields in OpenLP ignore diacritics/accents

Merge request reports