Embedding Elasticsearch reference
What does this MR do and why?
This MR introduces a new Elastic reference for embeddings. The reference is able to deal with any embedding that has a database record, not only issues.
- Creates a feature flag for generating embeddings: [Feature flag] Rollout of `elasticsearch_issue_... (#461624 - closed)
- Adds callbacks to create and add references for issues on the following conditions
- On create
- On update if
title
ordescription
was changed - Only issues (not workitems or epics)
- Only public issues
-
ai_global_switch
feature flag enabled -
elaticsearch_issue_upsert
feature flag enabled: this allows partial indexing -
elasticsearch_issue_embedding
feature flag enabled for the issue's namespace Gitlab::Saas.feature_available?(:ai_vertex_embeddings)
- Vectors are supported: Elasticsearch 8+ is required
-
add_embedding_to_issues
migration is finished
- Creates a new reference called
Embedding
:- Serializes into
"Embedding|model_klass|id|routing"
e.g."Embedding|Issue|23|project_676"
-
as_indexed_json
only contains theembedding
andembedding_version
. Because we use upsert, we can do partial indexing.
- Serializes into
- Every reference generates an embedding by making a call to vertex API. Before calling the API, we check if the endpoint is throttled by using
ApplicationRateLimiter
and if it is, we put the reference back into the queue so that it can be retried on the next cron run by raising and catching an exception. - Scheduling
ElasticIndexEmbeddingBulkCronWorker
to run every minute to process embedding references from the queue.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
-
Ensure migrations have completed:
Elastic::MigrationWorker.new.perform
-
Setup local vertex AI access by either following these instructions or asking @maddievn for her sandbox
project_id
-
bundle exec rake gitlab:duo:enable_feature_flags
-
Feature.disable(:use_ai_gateway_proxy)
-
Update an existing issue's title or description or create a new issue (has to be in a public project).
-
Note that in the logs you have two references added. If you don't see the embedding tracked, check what's missing in the check.
{"class":"Elastic::ProcessBookkeepingService","meta.indexing.redis_set":"elastic:incremental:updates:0:zset","meta.indexing.tracked_items_encoded":"[[232,\"Issue 339 339 project_14\"]]"} {"class":"Search::Elastic::ProcessEmbeddingBookkeepingService","meta.indexing.redis_set":"elastic:embedding:updates:1:zset","meta.indexing.tracked_items_encoded":"[[187,\"Embedding|Issue|339|project_14\"]]"}
-
Execute the bookkeeping services:
Search::Elastic::ProcessEmbeddingBookkeepingService.new.execute; Elastic::ProcessBookkeepingService.new.execute
-
Verify that the document in Elasticsearch now has an embedding
Related to #457724 (closed)