[ActiveContext] Adjust throughput for BulkProcessWorker

Context

The embeddings generation is really slow, mostly due to the BulkProcessWorker processing only 1000 documents at a time with no parallelization.

Additionally, for the initial indexing of gitlab-org/gitlab, the embeddings generation part of the pipeline is taking about 5 hours and counting.

The `BulkProcessWorker`

The Ai::ActiveContext::BulkProcessWorker includes the ActiveContext::Concerns::BulkAsyncProcess, which essentially gives the processing logic.

the BulkProcessWorker is scheduled to run every minute.
on perform, the BulkProcessWorker picks up N number of refs in M parallel shards from the ActiveContext queue (ActiveContext::Queues.all_queued_items), where:
- N = the shard_limit, currently at 1000
- M = the number_of_shards, currently at 1
for each parallel shard, after the N number of refs have been processed, it re-enqueues the BulkProcessWorker for that shard after 1 second
there is a lock mechanism that checks uniqueness on the queue (e.g. Code Embeddings queue) and the shard, so that job that is scheduled by cron to run every minute does not interfere with the job scheduled on a re-enqueue. This lock mechanism has a TTL of 10 minutes

What this means is that currently, the BulkProcessWorker generates embeddings for a batch of 1000 documents / code chunks, with each batch process running in sequence.

Settings configurable in code

the number_of_shards can be updated in Ai::ActiveContext::Queues::Code
the shard_limit is defined in the ActiveContext::Concerns::Queue in the gitlab-activecontext-gem

References

description of how BulkProcessWorker processes queued documents: https://gitlab.com/gitlab-org/gitlab/-/issues/552304#note_2611397900

Proposal

We should resolve this in 2 steps:

1 - update throughput in code

Given the extremely low usage for textembedding-gecko (aka text-embedding-005) in production, as well as in staging _even with the Code Embeddings pipeline enabled there, we can update the shard_limit and number_of_shards in the Ai::ActiveContext::Queues::Code based on our findings for the initial indexing of gitlab-org/gitlab on staging.

Usage in Staging and Production

Staging	Production
Currently processing Code Embeddings for `gitlab-org/gitlab`	Processing embeddings for Advanced Search

Proposed Changes

On ee/lib/ai/active_context/queues/code.rb, update:

diff --git a/ee/lib/ai/active_context/queues/code.rb b/ee/lib/ai/active_context/queues/code.rb
index 35ab0f446e26..2dba14bc553a 100644
--- a/ee/lib/ai/active_context/queues/code.rb
+++ b/ee/lib/ai/active_context/queues/code.rb
@@ -7,7 +7,15 @@ class Code
         class << self
           # having a single shard means we have absolute control over the amount of embeddings we generate in one go
           def number_of_shards
-            1
+            4
+          end
+
+          # override the shard limit defined in ::ActiveContext::Concerns::Queue
+          # for now, let's go with 1000 since we can probably achieve more throughput
+          # by just parallelizing the processing alone
+          # we override it here to make it easier to update later
+          def shard_limit
+            1000
           end
         end

2 - allow a more flexible way to adjust both the `number_of_shards` and `shard_limit`

Perhaps we can introduce a setting specific to ActiveContext (not ApplicationSettings) so we can adjust the number_of_shards and shard_limit for any pipeline. For now, we are only concerned with the Code Embeddings pipeline but we may add more pipelines later. We need to make sure that this setting can handle different embeddings pipelines. Ideally, this should also be adjustable per instance.

In the below example, this assumes we have a new table called ai_active_context_settings with fields name (to for each different embeddings pipeline) and options (of jsonb type to hold the actual settings).

Note: this is just an example, and we can discuss other approaches to make these settings flexible and adjustable per instance

diff --git a/ee/lib/ai/active_context/queues/code.rb b/ee/lib/ai/active_context/queues/code.rb
index 35ab0f446e26..1347ae9c3f60 100644
--- a/ee/lib/ai/active_context/queues/code.rb
+++ b/ee/lib/ai/active_context/queues/code.rb
@@ -7,7 +7,11 @@ class Code
         class << self
           # having a single shard means we have absolute control over the amount of embeddings we generate in one go
           def number_of_shards
-            1
+            Ai::ActiveContext::Setting.find_by(name: 'code').options[:number_of_shards]
+          end
+
+          def shard_limit
+            Ai::ActiveContext::Setting.find_by(name: 'code').options[:shard_limit]
           end
         end

Edited Aug 27, 2025 by 🤖 GitLab Bot 🤖