Skip to content

Embedding Elasticsearch reference

What does this MR do and why?

This MR introduces a new Elastic reference for embeddings. The reference is able to deal with any embedding that has a database record, not only issues.

  • Creates a feature flag for generating embeddings: [Feature flag] Rollout of `elasticsearch_issue_... (#461624 - closed)
  • Adds callbacks to create and add references for issues on the following conditions
    • On create
    • On update if title or description was changed
    • Only issues (not workitems or epics)
    • Only public issues
    • ai_global_switch feature flag enabled
    • elaticsearch_issue_upsert feature flag enabled: this allows partial indexing
    • elasticsearch_issue_embedding feature flag enabled for the issue's namespace
    • Gitlab::Saas.feature_available?(:ai_vertex_embeddings)
    • Vectors are supported: Elasticsearch 8+ is required
    • add_embedding_to_issues migration is finished
  • Creates a new reference called Embedding:
    • Serializes into "Embedding|model_klass|id|routing" e.g. "Embedding|Issue|23|project_676"
    • as_indexed_json only contains the embedding and embedding_version. Because we use upsert, we can do partial indexing.
  • Every reference generates an embedding by making a call to vertex API. Before calling the API, we check if the endpoint is throttled by using ApplicationRateLimiter and if it is, we put the reference back into the queue so that it can be retried on the next cron run by raising and catching an exception.
  • Scheduling ElasticIndexEmbeddingBulkCronWorker to run every minute to process embedding references from the queue.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

  1. Ensure migrations have completed: Elastic::MigrationWorker.new.perform

  2. Setup local vertex AI access by either following these instructions or asking @maddievn for her sandbox project_id

  3. bundle exec rake gitlab:duo:enable_feature_flags

  4. Feature.disable(:use_ai_gateway_proxy)

  5. Update an existing issue's title or description or create a new issue (has to be in a public project).

  6. Note that in the logs you have two references added. If you don't see the embedding tracked, check what's missing in the check.

    {"class":"Elastic::ProcessBookkeepingService","meta.indexing.redis_set":"elastic:incremental:updates:0:zset","meta.indexing.tracked_items_encoded":"[[232,\"Issue 339 339 project_14\"]]"}
    {"class":"Search::Elastic::ProcessEmbeddingBookkeepingService","meta.indexing.redis_set":"elastic:embedding:updates:1:zset","meta.indexing.tracked_items_encoded":"[[187,\"Embedding|Issue|339|project_14\"]]"}
  7. Execute the bookkeeping services: Search::Elastic::ProcessEmbeddingBookkeepingService.new.execute; Elastic::ProcessBookkeepingService.new.execute

  8. Verify that the document in Elasticsearch now has an embedding

Related to #457724 (closed)

Edited by Madelein van Niekerk

Merge request reports

Loading