Embedding Elasticsearch reference
What does this MR do and why?
This MR introduces a new Elastic reference for embeddings. The reference is able to deal with any embedding that has a database record, not only issues.
- Creates a feature flag for generating embeddings: [Feature flag] Rollout of `elasticsearch_issue_... (#461624 - closed)
- Adds callbacks to create and add references for issues on the following conditions
- On create
- On update if
titleordescriptionwas changed - Only issues (not workitems or epics)
- Only public issues
-
ai_global_switchfeature flag enabled -
elaticsearch_issue_upsertfeature flag enabled: this allows partial indexing -
elasticsearch_issue_embeddingfeature flag enabled for the issue's namespace Gitlab::Saas.feature_available?(:ai_vertex_embeddings)- Vectors are supported: Elasticsearch 8+ is required
-
add_embedding_to_issuesmigration is finished
- Creates a new reference called
Embedding:- Serializes into
"Embedding|model_klass|id|routing"e.g."Embedding|Issue|23|project_676" -
as_indexed_jsononly contains theembeddingandembedding_version. Because we use upsert, we can do partial indexing.
- Serializes into
- Every reference generates an embedding by making a call to vertex API. Before calling the API, we check if the endpoint is throttled by using
ApplicationRateLimiterand if it is, we put the reference back into the queue so that it can be retried on the next cron run by raising and catching an exception. - Scheduling
ElasticIndexEmbeddingBulkCronWorkerto run every minute to process embedding references from the queue.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
-
Ensure migrations have completed:
Elastic::MigrationWorker.new.perform -
Setup local vertex AI access by either following these instructions or asking @maddievn for her sandbox
project_id -
bundle exec rake gitlab:duo:enable_feature_flags -
Feature.disable(:use_ai_gateway_proxy) -
Update an existing issue's title or description or create a new issue (has to be in a public project).
-
Note that in the logs you have two references added. If you don't see the embedding tracked, check what's missing in the check.
{"class":"Elastic::ProcessBookkeepingService","meta.indexing.redis_set":"elastic:incremental:updates:0:zset","meta.indexing.tracked_items_encoded":"[[232,\"Issue 339 339 project_14\"]]"} {"class":"Search::Elastic::ProcessEmbeddingBookkeepingService","meta.indexing.redis_set":"elastic:embedding:updates:1:zset","meta.indexing.tracked_items_encoded":"[[187,\"Embedding|Issue|339|project_14\"]]"} -
Execute the bookkeeping services:
Search::Elastic::ProcessEmbeddingBookkeepingService.new.execute; Elastic::ProcessBookkeepingService.new.execute -
Verify that the document in Elasticsearch now has an embedding
Related to #457724 (closed)