Elasticsearch: Pre-process code before indexing
I think we need to make code indexing smarter and language aware. I think we need pre-process code so every piece of code is broken down into terms like: class names, methods, variables, some other. In this case, we could make ES filters and tokenizers easier. We would still need to break names into micro terms (NameSpace -> name + space).
This would make blob index smaller as a bonus. See https://gitlab.com/gitlab-org/gitlab-ee/issues/3327
Having search language-aware would also open huge possibilities for new features.