Bulk code embeddings generation can exceed Vertex AI's total tokens limit

Context

Background

The Ai::ActiveContext::BulkProcessWorker gets a list of jobs from an ActiveContext queue and processes those jobs. This worker is leveraged in the Code Embeddings generation process to generate embeddings for a batch of contents by calling:

# `Ai::ActiveContext::Queues::Code` points to a queue containing a list of
# Elasticsearch documents, each of which has a field for raw code content
::Ai::ActiveContext::BulkProcessWorker.new.perform("Ai::ActiveContext::Queues::Code", 0)

In this scenario specific to Ai::ActiveContext::Queues::Code, the BulkProcessWorker will ultimately call ActiveContext::Embeddings, which in turn calls Gitlab::Llm::VertexAi::Embeddings::Text#execute:

# note that `content` here is an *Array* of code snippets
embeddings = Gitlab::Llm::VertexAi::Embeddings::Text
  .new(content, user: user, tracking_context: { action: action }, unit_primitive: unit_primitive, model: model)
  .execute

Problem

If there are a lot of items in the Ai::ActiveContext::Queues::Code, the inputs to the VertexAi::Embeddings::Text bulk embeddings generation request could have a total tokens count that exceeds Vertex AI's limits, resulting in this error:

Retryable Error occurred: ["Unable to submit request because the input token count is 20546 but the model supports up to 20000. Reduce the input token count and try again. You can also use the CountTokens API to calculate prompt token count and billable characters. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models\"]

Expand for backtrace

Backtrace
["/Users/pamartiaga/Code/gitlab-development-kit/gitlab/ee/lib/gitlab/llm/vertex_ai/embeddings/text.rb:29:in `execute'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/embeddings.rb:9:in `generate_embeddings'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:82:in `block in generate_embeddings_for_each_version'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:81:in `each'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:81:in `each_with_object'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:81:in `generate_embeddings_for_each_version'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:48:in `block (3 levels) in apply_embeddings'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:44:in `each_value'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:44:in `block (2 levels) in apply_embeddings'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:39:in `each'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:39:in `each_slice'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:39:in `block in apply_embeddings'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:60:in `with_batch_handling'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:22:in `apply_embeddings'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/ee/lib/ai/active_context/references/code.rb:17:in `block in <class:Code>'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:26:in `block (2 levels) in preprocess'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:23:in `each'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:23:in `block in preprocess'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:19:in `each'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:19:in `preprocess'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/reference.rb:38:in `preprocess_references'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/bulk_process_queue.rb:51:in `process'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/bulk_process_queue.rb:17:in `block in process!'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/redis/wrapper.rb:29:in `block in with'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:110:in `block (2 levels) in with'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:109:in `handle_interrupt'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:109:in `block in with'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:106:in `handle_interrupt'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:106:in `with'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/redis/wrapper.rb:29:in `with'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/redis.rb:6:in `with_redis'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/bulk_process_queue.rb:17:in `process!'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/bulk_process_queue.rb:6:in `process!'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/bulk_async_process.rb:38:in `block in process_shard'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/exclusive_lease_helpers.rb:43:in `block in in_lock'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/exclusive_lease_helpers.rb:53:in `with_instrumentation'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/exclusive_lease_helpers.rb:42:in `in_lock'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/bulk_async_process.rb:37:in `process_shard'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/bulk_async_process.rb:22:in `perform'",
 "/Users/pamartiaga/Code/gitlab-development-kit/gitlab/ee/app/workers/concerns/geo/skip_secondary.rb:14:in `perform'",
 "(pry):14:in `generate_embeddings'",
 "(pry):26:in `__pry__'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:290:in `eval'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:290:in `evaluate_ruby'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:659:in `handle_line'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:261:in `block (2 levels) in eval'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:260:in `catch'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:260:in `block in eval'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:259:in `catch'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:259:in `eval'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:77:in `block in repl'",
 "<internal:kernel>:187:in `loop'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:67:in `repl'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:38:in `block in start'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/input_lock.rb:61:in `__with_ownership'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/input_lock.rb:78:in `with_ownership'",
  "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:38:in `start'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:15:in `start'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-byebug-3.11.0/lib/pry-byebug/pry_ext.rb:15:in `start_with_pry_byebug'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-shell-0.6.4/lib/pry/shell/patches/pry_byebug.rb:67:in `start_with_pry_byebug'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_class.rb:194:in `start'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/commands/console/console_command.rb:78:in `start'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/commands/console/console_command.rb:16:in `start'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/commands/console/console_command.rb:106:in `perform'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/thor-1.3.1/lib/thor/command.rb:28:in `run'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/thor-1.3.1/lib/thor/invocation.rb:127:in `invoke_command'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command/base.rb:178:in `invoke_command'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/thor-1.3.1/lib/thor.rb:527:in `dispatch'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command/base.rb:73:in `perform'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command.rb:71:in `block in invoke'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command.rb:149:in `with_argv'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command.rb:69:in `invoke'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/commands.rb:18:in `<main>'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/3.3.0/bundled_gems.rb:69:in `require'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/3.3.0/bundled_gems.rb:69:in `block (2 levels) in replace_require'",
 "/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/bootsnap-1.18.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:30:in `require'",
 "bin/rails:4:in `<main>'"]

This error is logged as a WARN, but the task is retried. Because the error is related to token limits, the same error happens and gets worse as you add more refs to the Ai::ActiveContext::Queues::Code queue.

Cause

As indicated in the error message, Vertex AI text-embedding-005 has an API limit of 20000 tokens. This means we can't make a batch request where the number of tokens of each content has a total of >20000.

Further Context and Investigations

This issue came up when trying to generate embeddings for GitLab.com for the Initial Demo. Each content is chunked by byte_size=1000.

The call that specifically caused the error is:

# given a generate_embeddings method
def generate_embeddings(ids)
  ::Ai::ActiveContext::Collections::Code.track_refs!(routing: "1", hashes: ids)
  ::Ai::ActiveContext::BulkProcessWorker.new.perform("Ai::ActiveContext::Queues::Code", 0)
end

ref_ids = ids_on_elasticsearch
ref_ids.each_slice(50) do |ids|
  generate_embeddings(ids)
  sleep(2)
end

Further investigations show that:

generate_embeddings(ref_ids.slice(0, 11)) # no error on only 11 refs
generate_embeddings(ref_ids.slice(0, 20)) # no error on only 20 refs
generate_embeddings(ref_ids.slice(0, 30)) # error occurs with 30 refs

References

Vertex AI Text Embeddings Guide

Proposal

Step 1 - Allow `ActiveContext::Preprocessors::Embeddings::BATCH_SIZE` to be configurable.

This will allow us to configure the batch size depending on the token size of the code chunks in the vector store. We will likely need to calculate the token size based on estimates or on testing. For example, with a chunk size of 1000 bytes, Vertex AI can only handle a batch of 20-25 inputs.

Suggested configuration approach:

Add batch_size to the Ai::ActiveContext::Collections::Code::MODELS array, and make sure to provide the batch_size to ActiveContext::Preprocessors::Embeddings#apply_embeddings. This can be done by providing the argument in the Ai::ActiveContext::References::Code call to apply_embeddings

Step 2 - Have `Gitlab::Llm::VertexAi::Embeddings::Text` throw a specific error if token limit is reached

Since we do not have accurate token calculations per chunk, the batch_size calculation from Step 1 would be based on an estimate. There may still be a chance that we would run into the "token limits exceeded" error. We need to make sure that VertexAi::Embeddings::Text communicates the error lucidly.

We need to introduce a new error class in VertexAi::Embeddings::Text, e.g. VertexAi::Embeddings::Text::TokenLimitsExceeded.

Step 3 - Have ActiveContext handle the token limits exceeded error

In ActiveContext::Embeddings, introduce an error that can be propagated to other ActiveContext classes. This should encapsulate the VertexAi::Embeddings::Text implementation from the rest of ActiveContext, since we want to be able to swap this out for another model in the future.
- proposed error name: BatchTooLarge
- further implementation details: ActiveContext::Embeddings should catch the VertexAi::Embeddings::Text::TokenLimitsExceeded and raise it as ActiveContext::Embeddings::BatchTooLarge
In ActiveContext::Preprocessors::Embeddings implement a back-off to reduce the batch size (only for the current call) if it encounters an ActiveContext::Embeddings::BatchTooLarge error

Edited Jun 25, 2025 by Pam Artiaga