@jonknu as Heinrich mentioned , this was an intentional change. Partial word searches don't work very well with PG Full Text searching. @engwan do we currently support prefix searching? That might help a little with some of the fringe cases?
do we currently support prefix searching? That might help a little with some of the fringe cases?
We currently do not. PG full-text search does support prefix searches but it also has it's problems due to word stemming.
For example, we have the word "testing" in an issue. When this is indexed, it is stored as "test" in the DB. This stemming / converting to "root words" is done so that pluralization and other variations of the word would still match.
Now, if I do a prefix search for "testi*", it would not match the issue above.
@gweaver@nickleonard thoughts on having a group / project level setting to always use the old pattern based searching?
I think having a filter bar option so that you can switch as needed on a per-search basis is great. But I think it can be confusing for users because when users want to search they don't really want to care about how the search is done.
@engwan I think a setting would confuse more people than it would help. I think we really ought to think about the longer-term plans for search in general and how we can merge PG full-text search with ES. There are lots of hurdles to overcome though:
Reduce index time or find an alternative solution to allow recently created resources to appear in search results.
Make ES always available for all instances so we don't have to maintain two different search implementations.
Full support for existing filtering/sort behavior within the filter bar.
FYI @JohnMcGuire as I know you've been thinking about this.
The issue list page filter bar was using ILIKE queries for doing plain text search.
This is not performant and the searching for random phrases on the issue list would regularly result in a 5xx timeout. More or less, searching for issues based on plain text from the issue list would just not work/break/timeout linearly with the number of issues you were searching across.
We implemented PG full text search, which fixes the timeout problems, but is a bit more "limiting" that a traditional ILIKE query.
I understand the choice for fulltext-searching, but this is rather problematic for certain locales. In Dutch and German (and maybe other languages too), a lot of nouns are combined to single words. This makes it almost impossible to find the right issues when searching for rather specific terms, which would work if the words were English.
For example: the German word for occupational accident insurance law is Arbeiterunfallverischerungsgesetz. Searching prefix (arbeit) nor suffix (gesetz) nor middle part (verischerung) would result in a list of issues mentioning the issue containing Arbeiterunfallverischerungsgesetz. In English we would simply search for occupationallaw or insurance, and we would easily get hits, simply because the words are separated by whitespace.
The Global search function (/) is still able to search part of words, but that tool is not the most conventional when already displaying a list of issues. Subfiltering remains a problem in these locales.
You can still disable this behavior by disabling the issues_full_text_search feature flag.
I will try that out, thanks.
It's currently set to use the english dictionary. But I tried using the german dictionary and postgres does not split this too
That makes sense. A lot of combined words can also be simply non-dictionary words which are context specific. My German sample is just one of the many possible combinations, I don't think PG can index all possible stems.
With !101565 (merged), we are also now doing prefix matching. So while this is not exactly like the substring matching we had before, it makes it more similar.