Skip to content

Backfill gitlab issue embeddings

Madelein van Niekerk requested to merge 456918-backfill-embeddings into master

What does this MR do and why?

Backfills gitlab group issue embeddings on gitlab.com: Issues updated within the last year from the gitlab-org/gitlab project.

The expected runtime is 5 hours.

The migration only runs on gitlab.com and is skipped for other instances.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Logs

{"severity":"INFO","time":"2024-06-11T11:05:51.509Z","class":"Elastic::MigrationWorker","message":"MigrationWorker: migration[BackfillInitialEmbeddings] executing migrate method","job_status":"running","queue":"default","jid":null}
{"severity":"INFO","time":"2024-06-11T11:05:51.549Z","class":"BackfillInitialEmbeddings","message":"[Elastic::Migration: 20240610133559] Setting migration_state to {\"remaining_count\":24}"}
{"severity":"INFO","time":"2024-06-11T11:05:51.581Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"remaining_count":24,"message":"[Elastic::Migration: 20240610133559] Checking the number of documents without fields"}
{"severity":"INFO","time":"2024-06-11T11:05:51.584Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"index_name":"gitlab-development-issues","batch_size":200,"message":"[Elastic::Migration: 20240610133559] Start backfilling fields"}
{"severity":"DEBUG","time":"2024-06-11T11:05:51.833Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"track_items","meta.indexing.redis_set":"elastic:embedding:updates:0:zset","meta.indexing.count":10,"meta.indexing.tracked_items_encoded":"[[1,\"Embedding|Issue|547|project_13\"],[2,\"Embedding|Issue|553|project_13\"],[3,\"Embedding|Issue|322|project_13\"],[4,\"Embedding|Issue|546|project_13\"],[5,\"Embedding|Issue|549|project_13\"],[6,\"Embedding|Issue|551|project_13\"],[7,\"Embedding|Issue|554|project_13\"],[8,\"Embedding|Issue|535|project_12\"],[9,\"Embedding|Issue|537|project_12\"],[10,\"Embedding|Issue|294|project_12\"]]"}
{"severity":"DEBUG","time":"2024-06-11T11:05:51.833Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"track_items","meta.indexing.redis_set":"elastic:embedding:updates:1:zset","meta.indexing.count":14,"meta.indexing.tracked_items_encoded":"[[1,\"Embedding|Issue|548|project_13\"],[2,\"Embedding|Issue|552|project_13\"],[3,\"Embedding|Issue|314|project_13\"],[4,\"Embedding|Issue|545|project_13\"],[5,\"Embedding|Issue|550|project_13\"],[6,\"Embedding|Issue|295|project_12\"],[7,\"Embedding|Issue|536|project_12\"],[8,\"Embedding|Issue|539|project_12\"],[9,\"Embedding|Issue|541|project_12\"],[10,\"Embedding|Issue|544|project_12\"],[11,\"Embedding|Issue|538|project_12\"],[12,\"Embedding|Issue|540|project_12\"],[13,\"Embedding|Issue|542|project_12\"],[14,\"Embedding|Issue|543|project_12\"]]"}
{"severity":"INFO","time":"2024-06-11T11:05:51.833Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"index_name":"gitlab-development-issues","documents_count":24,"message":"[Elastic::Migration: 20240610133559] Backfilling batch has been processed"}
{"severity":"INFO","time":"2024-06-11T11:05:51.842Z","class":"BackfillInitialEmbeddings","message":"[Elastic::Migration: 20240610133559] Setting migration_state to {\"remaining_count\":24}"}
{"severity":"INFO","time":"2024-06-11T11:05:51.872Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"remaining_count":24,"message":"[Elastic::Migration: 20240610133559] Checking the number of documents without fields"}
{"severity":"INFO","time":"2024-06-11T11:05:51.875Z","class":"Elastic::MigrationWorker","message":"MigrationWorker: migration[BackfillInitialEmbeddings] updating with completed: false","job_status":"running","queue":"default","jid":null}
{"severity":"INFO","time":"2024-06-11T11:05:51.922Z","class":"BackfillInitialEmbeddings","message":"[Elastic::Migration: 20240610133559] Setting migration_state to {\"remaining_count\":24}"}
{"severity":"INFO","time":"2024-06-11T11:05:51.950Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"remaining_count":24,"message":"[Elastic::Migration: 20240610133559] Checking the number of documents without fields"}
{"severity":"INFO","time":"2024-06-11T11:05:51.954Z","class":"Elastic::MigrationWorker","message":"MigrationWorker: migration[BackfillInitialEmbeddings] kicking off next migration batch","job_status":"running","queue":"default","jid":null}
{"severity":"INFO","time":"2024-06-11T11:06:11.752Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexing_start","meta.indexing.redis_set":"elastic:embedding:updates:0:zset","meta.indexing.records_count":10,"meta.indexing.first_score":1.0,"meta.indexing.last_score":10.0}
{"severity":"INFO","time":"2024-06-11T11:06:11.752Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexing_start","meta.indexing.redis_set":"elastic:embedding:updates:1:zset","meta.indexing.records_count":14,"meta.indexing.first_score":1.0,"meta.indexing.last_score":14.0}
{"severity":"INFO","time":"2024-06-11T11:07:00.400Z","message":"bulk_submitted","meta.indexing.body_size_bytes":393525,"meta.indexing.bulk_count":24,"meta.indexing.errors_count":0}
{"severity":"INFO","time":"2024-06-11T11:07:00.404Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexer_flushed","meta.indexing.search_flushing_duration_s":0.04783699999097735,"meta.indexing.search_indexed_bytes_per_second":8089}
{"severity":"INFO","time":"2024-06-11T11:07:00.447Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexing_end","meta.indexing.redis_set":"elastic:embedding:updates:0:zset","meta.indexing.records_count":10,"meta.indexing.first_score":1.0,"meta.indexing.last_score":10.0,"meta.indexing.failures_count":0,"meta.indexing.bulk_execution_duration_s":48.695372}
{"severity":"INFO","time":"2024-06-11T11:07:00.448Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexing_end","meta.indexing.redis_set":"elastic:embedding:updates:1:zset","meta.indexing.records_count":14,"meta.indexing.first_score":1.0,"meta.indexing.last_score":14.0,"meta.indexing.failures_count":0,"meta.indexing.bulk_execution_duration_s":48.695644}

How to set up and validate locally

  1. Change the skip condition on the migration to be false to simulate .com
  2. Change the group ids to groups in your local env with public issues
  3. Make sure you can generate embeddings (reference)
  4. Execute the migration: Elastic::MigrationWorker.new.perform

Related to #456918 (closed)

Edited by Madelein van Niekerk

Merge request reports