ActiveContext: Skip current model during model backfill (!225666) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

This MR enables skipping the current embedding model during backfill operations in ActiveContext. When migrating to a new embedding model, we need to backfill embeddings using only the next model, not the current one, to avoid redundant processing.

How it works:

Dedicated CodeBackfill Queue - A new CodeBackfill queue inherits from Code and sets preprocess_options to { next_model_only: true }, ensuring backfill operations only process the next embedding model. The benefit of using a separate queue is that we no longer have a race condition for determining when the backfill is done due to incremental indexing continuing to add refs but now since it's a dedicated queue, we know that when the queue is empty, the backfill is done.
Options Pipeline - To support passing next_model_only through the processing chain:
- Preprocessor concern now accepts and forwards **options to preprocessor blocks
- Queue concern provides preprocess_options method (defaults to empty hash)
- BulkProcessQueue passes queue.preprocess_options to preprocessing
- Reference.preprocess_references accepts and forwards options
Model Filtering - Reference#indexing_embedding_models now accepts next_model_only parameter to return only the next model when needed.
Collection-Agnostic Design - Collection concern has a backfill_queue method that defaults to the main queue but can be overridden, making this extensible for other collections.
BackfillEmbeddings Task - Uses collection_class.backfill_queue to get the appropriate queue

Why?

During embedding model migrations, we need to backfill the new model without reprocessing the current model. This MR provides a clean, extensible way to handle this by using a dedicated queue with specific preprocessing options.

References

Related to #589327 (closed)

How to test

Run activation service to switch to a new field (but use the same model)

Ai::ActiveContext::EmbeddingModelActivationService.new(collection_class: Ai::ActiveContext::Collections::Code, model_ref: "text_embedding_005_vertex", dimensions: 768).execute!

Run task worker once

Ai::ActiveContext::TaskWorker.new.perform

Verify embeddings_v2 field added to mapping
Run task worker once

Ai::ActiveContext::TaskWorker.new.perform

Verify items were added to the CodeBackfill queue

::Ai::ActiveContext::Queues::CodeBackfill.queued_items # has refs
::Ai::ActiveContext::Queues::Code.queued_items # no refs

Execute the queues

ActiveContext.execute_all_queues!

Verify that embeddings are generated once (will be twice on master for current + next model)
Verify that embeddings_v2 is populated
Run task worker

Ai::ActiveContext::TaskWorker.new.perform

Until the backfill task is marked as completed
Run task worker

Ai::ActiveContext::TaskWorker.new.perform

Run task worker

Ai::ActiveContext::TaskWorker.new.perform

Now embeddings_v1 should be nullified

Edited Mar 06, 2026 by Madelein van Niekerk

ActiveContext: Skip current model during model backfill

What does this MR do and why?

How it works:

Why?

References

How to test

Merge request reports