Elasticsearch: Pre-process code before indexing

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

  • Close this issue

I think we need to make code indexing smarter and language aware. I think we need pre-process code so every piece of code is broken down into terms like: class names, methods, variables, some other. In this case, we could make ES filters and tokenizers easier. We would still need to break names into micro terms (NameSpace -> name + space).

This would make blob index smaller as a bonus. See https://gitlab.com/gitlab-org/gitlab-ee/issues/3327

Having search language-aware would also open huge possibilities for new features.

Edited Jun 27, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading