ActiveContext: Skip current model during model backfill

What does this MR do and why?

This MR enables skipping the current embedding model during backfill operations in ActiveContext. When migrating to a new embedding model, we need to backfill embeddings using only the next model, not the current one, to avoid redundant processing.

How it works:

  1. Dedicated CodeBackfill Queue - A new CodeBackfill queue inherits from Code and sets preprocess_options to { next_model_only: true }, ensuring backfill operations only process the next embedding model. The benefit of using a separate queue is that we no longer have a race condition for determining when the backfill is done due to incremental indexing continuing to add refs but now since it's a dedicated queue, we know that when the queue is empty, the backfill is done.

  2. Options Pipeline - To support passing next_model_only through the processing chain:

    • Preprocessor concern now accepts and forwards **options to preprocessor blocks
    • Queue concern provides preprocess_options method (defaults to empty hash)
    • BulkProcessQueue passes queue.preprocess_options to preprocessing
    • Reference.preprocess_references accepts and forwards options
  3. Model Filtering - Reference#indexing_embedding_models now accepts next_model_only parameter to return only the next model when needed.

  4. Collection-Agnostic Design - Collection concern has a backfill_queue method that defaults to the main queue but can be overridden, making this extensible for other collections.

  5. BackfillEmbeddings Task - Uses collection_class.backfill_queue to get the appropriate queue

Why?

During embedding model migrations, we need to backfill the new model without reprocessing the current model. This MR provides a clean, extensible way to handle this by using a dedicated queue with specific preprocessing options.

References

Related to #589327 (closed)

How to test

  • Run activation service to switch to a new field (but use the same model)
Ai::ActiveContext::EmbeddingModelActivationService.new(collection_class: Ai::ActiveContext::Collections::Code, model_ref: "text_embedding_005_vertex", dimensions: 768).execute!
  • Run task worker once
Ai::ActiveContext::TaskWorker.new.perform
  • Verify embeddings_v2 field added to mapping
  • Run task worker once
Ai::ActiveContext::TaskWorker.new.perform
  • Verify items were added to the CodeBackfill queue
::Ai::ActiveContext::Queues::CodeBackfill.queued_items # has refs
::Ai::ActiveContext::Queues::Code.queued_items # no refs
  • Execute the queues
ActiveContext.execute_all_queues!
  • Verify that embeddings are generated once (will be twice on master for current + next model)
  • Verify that embeddings_v2 is populated
  • Run task worker
Ai::ActiveContext::TaskWorker.new.perform
  • Until the backfill task is marked as completed
  • Run task worker
Ai::ActiveContext::TaskWorker.new.perform
  • Run task worker
Ai::ActiveContext::TaskWorker.new.perform
  • Now embeddings_v1 should be nullified
Edited by Madelein van Niekerk

Merge request reports

Loading