Experiment: Semantic search with Elastic Learned Sparse EncodeR (ELSER)
Problem to solve
Currently, GitLab search employs very little in the way of ranking and relevance tuning on our search results. This means that results are suboptimal and often not relevant to a user (though they may technically match the query's keywords). Semantic search is one way we can improve relevance without expending large efforts into tuning relevance and ranking, but it's not without its own engineering overhead. Typically, semantic search requires generating embeddings using a language model, with fine-tuning of the encoding model needed to make semantic search work well on your dataset; and fine-tuning can be challenging and expensive to do correctly.
Proposal
Test the quality of semantic searching by leveraging Elastic's Learned Sparse Encoder (ELSER) to bootstrap a semantic search without the overhead of choosing a model, fine-tuning it, and generating embeddings. To start, we would use ELSER to encode projects, issues, issue comments, and MR comments on the GitLab project for some set date range (like last 12 months).*
Deliverables
-
A simple UI to compare the results of the two searches; we can leverage an existing project for that.
-
Experimental hybrid (semantic + lexical) and semantic search options into the advanced search page, if we find that the semantic results are better than the default lexical results.
*Note: This wouldn't tie us long-term to this approach–it's just an experiment and not designed to be the end solution. If it works well then it will help validate the semantic search use case and that we should invest time, effort, and money into a long term solution.
Risks
The risks are primarily in engineering time and additional cost incurred by our Elasticsearch cluster, should the results not turn out to be consistently better. We can mitigate those risks by comparing search results from our current search to ELSER results before we implement anything in the product (deliverable 1).
Benefits
- Shorter time-to-validation of semantic search. This approach lets us test semantic search without having to implement a full vector search setup and fine-tune a model.
- Best-in-class solution out of the box. In accepted benchmarks (BEIR), ELSER outperformed 11 of 12 major models on search relevance tasks, including OpenAI DaVinci and Ada.
- Low risk, relative to the cost in engineering time and effort, and hosting, of implementing a full hybrid search setup.
- Potentially a "good enough" solution to take to production on .com to further validate demand for semantic search, similar to how we're running the Zoekt beta.
- Can be leveraged by Duo Chat to generate relevant results for LLM context