[ActiveContext] Adjust throughput for BulkProcessWorker
Context
From #554923 (comment 2657881221):
The embeddings generation is really slow, mostly due to the
BulkProcessWorkerprocessing only1000documents at a time with no parallelization.
Additionally, for the initial indexing of gitlab-org/gitlab, the embeddings generation part of the pipeline is taking about 5 hours and counting.
The BulkProcessWorker
The Ai::ActiveContext::BulkProcessWorker includes the ActiveContext::Concerns::BulkAsyncProcess, which essentially gives the processing logic.
- the
BulkProcessWorkeris scheduled to run every minute. -
on
perform, theBulkProcessWorkerpicks upNnumber of refs inMparallel shards from the ActiveContext queue (ActiveContext::Queues.all_queued_items), where:-
N= theshard_limit, currently at1000 -
M= thenumber_of_shards, currently at1
-
- for each parallel shard, after the
Nnumber of refs have been processed, it re-enqueues theBulkProcessWorkerfor that shard after 1 second - there is a lock mechanism that checks uniqueness on the queue (e.g. Code Embeddings queue) and the shard, so that job that is scheduled by cron to run every minute does not interfere with the job scheduled on a re-enqueue. This lock mechanism has a TTL of 10 minutes
What this means is that currently, the BulkProcessWorker generates embeddings for a batch of 1000 documents / code chunks, with each batch process running in sequence.
Settings configurable in code
- the
number_of_shardscan be updated inAi::ActiveContext::Queues::Code - the
shard_limitis defined in theActiveContext::Concerns::Queuein thegitlab-activecontext-gem
References
- description of how BulkProcessWorker processes queued documents: https://gitlab.com/gitlab-org/gitlab/-/issues/552304#note_2611397900
Proposal
We should resolve this in 2 steps:
1 - update throughput in code
Given the extremely low usage for textembedding-gecko (aka text-embedding-005) in production, as well as in staging _even with the Code Embeddings pipeline enabled there, we can update the shard_limit and number_of_shards in the Ai::ActiveContext::Queues::Code based on our findings for the initial indexing of gitlab-org/gitlab on staging.
Usage in Staging and Production
| Staging | Production |
|---|---|
Currently processing Code Embeddings for gitlab-org/gitlab
|
Processing embeddings for Advanced Search |
![]() |
![]() |
Proposed Changes
On ee/lib/ai/active_context/queues/code.rb, update:
diff --git a/ee/lib/ai/active_context/queues/code.rb b/ee/lib/ai/active_context/queues/code.rb
index 35ab0f446e26..2dba14bc553a 100644
--- a/ee/lib/ai/active_context/queues/code.rb
+++ b/ee/lib/ai/active_context/queues/code.rb
@@ -7,7 +7,15 @@ class Code
class << self
# having a single shard means we have absolute control over the amount of embeddings we generate in one go
def number_of_shards
- 1
+ 4
+ end
+
+ # override the shard limit defined in ::ActiveContext::Concerns::Queue
+ # for now, let's go with 1000 since we can probably achieve more throughput
+ # by just parallelizing the processing alone
+ # we override it here to make it easier to update later
+ def shard_limit
+ 1000
end
end
2 - allow a more flexible way to adjust both the number_of_shards and shard_limit
Perhaps we can introduce a setting specific to ActiveContext (not ApplicationSettings) so we can adjust the number_of_shards and shard_limit for any pipeline. For now, we are only concerned with the Code Embeddings pipeline but we may add more pipelines later. We need to make sure that this setting can handle different embeddings pipelines. Ideally, this should also be adjustable per instance.
In the below example, this assumes we have a new table called ai_active_context_settings with fields name (to for each different embeddings pipeline) and options (of jsonb type to hold the actual settings).
Note: this is just an example, and we can discuss other approaches to make these settings flexible and adjustable per instance
diff --git a/ee/lib/ai/active_context/queues/code.rb b/ee/lib/ai/active_context/queues/code.rb
index 35ab0f446e26..1347ae9c3f60 100644
--- a/ee/lib/ai/active_context/queues/code.rb
+++ b/ee/lib/ai/active_context/queues/code.rb
@@ -7,7 +7,11 @@ class Code
class << self
# having a single shard means we have absolute control over the amount of embeddings we generate in one go
def number_of_shards
- 1
+ Ai::ActiveContext::Setting.find_by(name: 'code').options[:number_of_shards]
+ end
+
+ def shard_limit
+ Ai::ActiveContext::Setting.find_by(name: 'code').options[:shard_limit]
end
end

