Change more Elasticsearch indexes to keyword type
What does this MR do?
Related to #213035 (closed) .
The Elasticsearch keyword type "is used for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags". This index is preferred over the current text type as the text type takes up more storage.
The text
type splits up the text as though it was human readable text
(ie. splitting words apart) and indexes each word separately in the
inverted index. As such the text
type will usually take up more space
in the inverted index and should only be used when you need to search
for individual words in the text.
For each of these cases this is not adding any value and possibly making certain searches incorrect. After testing locally this change appears to save ~4% disk storage.
Impact for each field
As per #213035 (comment 439629162) here is the reasoning on a per field basis:
-
state/merge_status
=> We only do exact matches against this for filtering. It's only 1 word so changing to keyword won't make any difference -
target_branch/source_branch
=> these are not used in any searches today so there is no risk to changing the index options. Changing this to keyword should have a decent storage improvement as these can be quite long and composed of many words -
merge_status
=> this is not used in any searches today so there is no risk to changing the index options. This appears to be things likecan_be_merged/cannot_be_merged/unchecked
which implies to me that it should be a keyword anyway as splitting this by word will be producing wrong results if we ever did filter on it and it will save some storage. -
commit.(commiter/author).email
=> this is used in commit searches today and it's hard to know exactly how this might be used by our current users.Users will lose some behaviour though if they were searching for partial email addresses before. For example you can search fordyl.griffith
and you will find commits authored by my email address which starts withdyl.griffith
. After this change to use keyword you'd need to search for the entire exact email address or you could use the prefix searchdyl.griffith*
as well. However, since prefix searches are (wildcards) can only be used at the end of the word you will not be able to search forgriffith
only after this change
Screenshots (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #213035 (closed)