Consider replacing SHA term matching on ngrams with prefix search
Summary
Currently, SHAs are indexed using ngrams from 5 to 40 characters. This means that each SHA is split into 35 separate terms taking up a lot of storage. SHAs are quite unique from 4-5 characters on, so a simple prefix search will be sufficiently fast and as effective as ngrams with term matching.
Improvements
Replacing the ngram analyzer with lowercase keywords and using a prefix search in the code. This reduced the index size ~13% in our tests.
Risks
GitLab's search needs to issue a prefix search instead a term match query, this will be a bit slower than term matching and increases the complexity of the client (as the fields containing a SHA need to be queried differently than the rest).
Elasticsearch mapping used for our tests: replace_sha_ngrams_with_keyword.json
Relates to #3327 (closed)