Support regular expressions with Advanced Global Search
UPDATE 2023-05-13: We are working on replacing our code search engine with Zoekt in &9404 as a solution to this problem. You can follow the progress in that epic.
Zendesk ticket: https://gitlab.zendesk.com/agent/tickets/85485 (GitLab internal-only link)
Problem
There are times when you may wish to use Regular expressions to search for very specific things in code. It's often common if you need to search for usages of a library or code style across multiple repositories. It's also usually important for developers trying to do large scale refactorings across many projects. Developers are used to the flexibility of regexes for targetting/filtering very specific parts of code and without this they may be forced to clone all the repos in a group in order to search across all of them which is not practical at a large organization.
Solution
Add the ability to do regex searches in Advanced Global Search. Elasticsearch does already support Regexp query but this is likely not going to scale well when searching large groups. It's also not very accurate and possibly not useful enough to use the Regexp query since this applies the regexes to the inverted index which won't contain all of the text, but rather only contains the tokens that we happen to capture and is missing things like special characters often and these are usually the kinds of things people are searching for with the regexp and thus many results will usually be missing with no way for the user to ever be sure.
I found some discussion that described an efficient implementation which involved making use of a trigram
index to first filter down the regex to a subset of candidate docs (by breaking up the regex into partial regexes) that might match the regexp before doing an after script
that does the final regexp filtering on a subset of all docs.
This is similar to the approach taken by Google's internal code search tool and will give all matches for the regex with fairly good performance and fairly low storage usage.