[ActiveContext] Adjust throughput for BulkProcessWorker

Context

From #554923 (comment 2657881221):

The embeddings generation is really slow, mostly due to the BulkProcessWorker processing only 1000 documents at a time with no parallelization.

Additionally, for the initial indexing of gitlab-org/gitlab, the embeddings generation part of the pipeline is taking about 5 hours and counting.

The BulkProcessWorker

The Ai::ActiveContext::BulkProcessWorker includes the ActiveContext::Concerns::BulkAsyncProcess, which essentially gives the processing logic.

  • the BulkProcessWorker is scheduled to run every minute.
  • on perform, the BulkProcessWorker picks up N number of refs in M parallel shards from the ActiveContext queue (ActiveContext::Queues.all_queued_items), where:
    • N = the shard_limit, currently at 1000
    • M = the number_of_shards, currently at 1
  • for each parallel shard, after the N number of refs have been processed, it re-enqueues the BulkProcessWorker for that shard after 1 second
  • there is a lock mechanism that checks uniqueness on the queue (e.g. Code Embeddings queue) and the shard, so that job that is scheduled by cron to run every minute does not interfere with the job scheduled on a re-enqueue. This lock mechanism has a TTL of 10 minutes

What this means is that currently, the BulkProcessWorker generates embeddings for a batch of 1000 documents / code chunks, with each batch process running in sequence.

Settings configurable in code

References

Proposal

We should resolve this in 2 steps:

1 - update throughput in code

Given the extremely low usage for textembedding-gecko (aka text-embedding-005) in production, as well as in staging _even with the Code Embeddings pipeline enabled there, we can update the shard_limit and number_of_shards in the Ai::ActiveContext::Queues::Code based on our findings for the initial indexing of gitlab-org/gitlab on staging.

Usage in Staging and Production

Staging Production
Currently processing Code Embeddings for gitlab-org/gitlab Processing embeddings for Advanced Search
Screenshot_2025-07-30_at_16.42.46 Screenshot_2025-07-30_at_16.42.36

Proposed Changes

On ee/lib/ai/active_context/queues/code.rb, update:

diff --git a/ee/lib/ai/active_context/queues/code.rb b/ee/lib/ai/active_context/queues/code.rb
index 35ab0f446e26..2dba14bc553a 100644
--- a/ee/lib/ai/active_context/queues/code.rb
+++ b/ee/lib/ai/active_context/queues/code.rb
@@ -7,7 +7,15 @@ class Code
         class << self
           # having a single shard means we have absolute control over the amount of embeddings we generate in one go
           def number_of_shards
-            1
+            4
+          end
+
+          # override the shard limit defined in ::ActiveContext::Concerns::Queue
+          # for now, let's go with 1000 since we can probably achieve more throughput
+          # by just parallelizing the processing alone
+          # we override it here to make it easier to update later
+          def shard_limit
+            1000
           end
         end

2 - allow a more flexible way to adjust both the number_of_shards and shard_limit

Perhaps we can introduce a setting specific to ActiveContext (not ApplicationSettings) so we can adjust the number_of_shards and shard_limit for any pipeline. For now, we are only concerned with the Code Embeddings pipeline but we may add more pipelines later. We need to make sure that this setting can handle different embeddings pipelines. Ideally, this should also be adjustable per instance.

In the below example, this assumes we have a new table called ai_active_context_settings with fields name (to for each different embeddings pipeline) and options (of jsonb type to hold the actual settings).

Note: this is just an example, and we can discuss other approaches to make these settings flexible and adjustable per instance

diff --git a/ee/lib/ai/active_context/queues/code.rb b/ee/lib/ai/active_context/queues/code.rb
index 35ab0f446e26..1347ae9c3f60 100644
--- a/ee/lib/ai/active_context/queues/code.rb
+++ b/ee/lib/ai/active_context/queues/code.rb
@@ -7,7 +7,11 @@ class Code
         class << self
           # having a single shard means we have absolute control over the amount of embeddings we generate in one go
           def number_of_shards
-            1
+            Ai::ActiveContext::Setting.find_by(name: 'code').options[:number_of_shards]
+          end
+
+          def shard_limit
+            Ai::ActiveContext::Setting.find_by(name: 'code').options[:shard_limit]
           end
         end
Edited by 🤖 GitLab Bot 🤖