Bulk code embeddings generation can exceed Vertex AI's total tokens limit
Context
Background
The Ai::ActiveContext::BulkProcessWorker gets a list of jobs from an ActiveContext queue and processes those jobs. This worker is leveraged in the Code Embeddings generation process to generate embeddings for a batch of contents by calling:
# `Ai::ActiveContext::Queues::Code` points to a queue containing a list of
# Elasticsearch documents, each of which has a field for raw code content
::Ai::ActiveContext::BulkProcessWorker.new.perform("Ai::ActiveContext::Queues::Code", 0)
In this scenario specific to Ai::ActiveContext::Queues::Code, the BulkProcessWorker will ultimately call ActiveContext::Embeddings, which in turn calls Gitlab::Llm::VertexAi::Embeddings::Text#execute:
# note that `content` here is an *Array* of code snippets
embeddings = Gitlab::Llm::VertexAi::Embeddings::Text
.new(content, user: user, tracking_context: { action: action }, unit_primitive: unit_primitive, model: model)
.execute
Problem
If there are a lot of items in the Ai::ActiveContext::Queues::Code, the inputs to the VertexAi::Embeddings::Text bulk embeddings generation request could have a total tokens count that exceeds Vertex AI's limits, resulting in this error:
Retryable Error occurred: ["Unable to submit request because the input token count is 20546 but the model supports up to 20000. Reduce the input token count and try again. You can also use the CountTokens API to calculate prompt token count and billable characters. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models\"]
Expand for backtrace
Backtrace
["/Users/pamartiaga/Code/gitlab-development-kit/gitlab/ee/lib/gitlab/llm/vertex_ai/embeddings/text.rb:29:in `execute'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/embeddings.rb:9:in `generate_embeddings'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:82:in `block in generate_embeddings_for_each_version'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:81:in `each'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:81:in `each_with_object'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:81:in `generate_embeddings_for_each_version'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:48:in `block (3 levels) in apply_embeddings'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:44:in `each_value'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:44:in `block (2 levels) in apply_embeddings'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:39:in `each'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:39:in `each_slice'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:39:in `block in apply_embeddings'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:60:in `with_batch_handling'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/preprocessors/embeddings.rb:22:in `apply_embeddings'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/ee/lib/ai/active_context/references/code.rb:17:in `block in <class:Code>'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:26:in `block (2 levels) in preprocess'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:23:in `each'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:23:in `block in preprocess'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:19:in `each'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/preprocessor.rb:19:in `preprocess'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/reference.rb:38:in `preprocess_references'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/bulk_process_queue.rb:51:in `process'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/bulk_process_queue.rb:17:in `block in process!'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/redis/wrapper.rb:29:in `block in with'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:110:in `block (2 levels) in with'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:109:in `handle_interrupt'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:109:in `block in with'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:106:in `handle_interrupt'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/connection_pool-2.5.3/lib/connection_pool.rb:106:in `with'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/redis/wrapper.rb:29:in `with'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/redis.rb:6:in `with_redis'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/bulk_process_queue.rb:17:in `process!'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/bulk_process_queue.rb:6:in `process!'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/bulk_async_process.rb:38:in `block in process_shard'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/exclusive_lease_helpers.rb:43:in `block in in_lock'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/exclusive_lease_helpers.rb:53:in `with_instrumentation'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/lib/gitlab/exclusive_lease_helpers.rb:42:in `in_lock'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/bulk_async_process.rb:37:in `process_shard'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/gems/gitlab-active-context/lib/active_context/concerns/bulk_async_process.rb:22:in `perform'",
"/Users/pamartiaga/Code/gitlab-development-kit/gitlab/ee/app/workers/concerns/geo/skip_secondary.rb:14:in `perform'",
"(pry):14:in `generate_embeddings'",
"(pry):26:in `__pry__'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:290:in `eval'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:290:in `evaluate_ruby'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:659:in `handle_line'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:261:in `block (2 levels) in eval'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:260:in `catch'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:260:in `block in eval'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:259:in `catch'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_instance.rb:259:in `eval'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:77:in `block in repl'",
"<internal:kernel>:187:in `loop'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:67:in `repl'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:38:in `block in start'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/input_lock.rb:61:in `__with_ownership'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/input_lock.rb:78:in `with_ownership'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:38:in `start'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/repl.rb:15:in `start'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-byebug-3.11.0/lib/pry-byebug/pry_ext.rb:15:in `start_with_pry_byebug'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-shell-0.6.4/lib/pry/shell/patches/pry_byebug.rb:67:in `start_with_pry_byebug'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/pry-0.14.2/lib/pry/pry_class.rb:194:in `start'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/commands/console/console_command.rb:78:in `start'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/commands/console/console_command.rb:16:in `start'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/commands/console/console_command.rb:106:in `perform'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/thor-1.3.1/lib/thor/command.rb:28:in `run'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/thor-1.3.1/lib/thor/invocation.rb:127:in `invoke_command'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command/base.rb:178:in `invoke_command'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/thor-1.3.1/lib/thor.rb:527:in `dispatch'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command/base.rb:73:in `perform'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command.rb:71:in `block in invoke'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command.rb:149:in `with_argv'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/command.rb:69:in `invoke'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/railties-7.1.5.1/lib/rails/commands.rb:18:in `<main>'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/3.3.0/bundled_gems.rb:69:in `require'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/3.3.0/bundled_gems.rb:69:in `block (2 levels) in replace_require'",
"/Users/pamartiaga/.asdf/installs/ruby/3.3.8/lib/ruby/gems/3.3.0/gems/bootsnap-1.18.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:30:in `require'",
"bin/rails:4:in `<main>'"]
This error is logged as a WARN, but the task is retried. Because the error is related to token limits, the same error happens and gets worse as you add more refs to the Ai::ActiveContext::Queues::Code queue.
Cause
As indicated in the error message, Vertex AI text-embedding-005 has an API limit of 20000 tokens. This means we can't make a batch request where the number of tokens of each content has a total of >20000.
Further Context and Investigations
This issue came up when trying to generate embeddings for GitLab.com for the Initial Demo. Each content is chunked by byte_size=1000.
The call that specifically caused the error is:
# given a generate_embeddings method
def generate_embeddings(ids)
::Ai::ActiveContext::Collections::Code.track_refs!(routing: "1", hashes: ids)
::Ai::ActiveContext::BulkProcessWorker.new.perform("Ai::ActiveContext::Queues::Code", 0)
end
ref_ids = ids_on_elasticsearch
ref_ids.each_slice(50) do |ids|
generate_embeddings(ids)
sleep(2)
end
Further investigations show that:
generate_embeddings(ref_ids.slice(0, 11)) # no error on only 11 refs
generate_embeddings(ref_ids.slice(0, 20)) # no error on only 20 refs
generate_embeddings(ref_ids.slice(0, 30)) # error occurs with 30 refs
References
Proposal
Step 1 - Allow ActiveContext::Preprocessors::Embeddings::BATCH_SIZE to be configurable.
This will allow us to configure the batch size depending on the token size of the code chunks in the vector store. We will likely need to calculate the token size based on estimates or on testing. For example, with a chunk size of 1000 bytes, Vertex AI can only handle a batch of 20-25 inputs.
Suggested configuration approach:
Add batch_size to the Ai::ActiveContext::Collections::Code::MODELS array, and make sure to provide the batch_size to ActiveContext::Preprocessors::Embeddings#apply_embeddings. This can be done by providing the argument in the Ai::ActiveContext::References::Code call to apply_embeddings
Step 2 - Have Gitlab::Llm::VertexAi::Embeddings::Text throw a specific error if token limit is reached
Since we do not have accurate token calculations per chunk, the batch_size calculation from Step 1 would be based on an estimate. There may still be a chance that we would run into the "token limits exceeded" error. We need to make sure that VertexAi::Embeddings::Text communicates the error lucidly.
We need to introduce a new error class in VertexAi::Embeddings::Text, e.g. VertexAi::Embeddings::Text::TokenLimitsExceeded.
Step 3 - Have ActiveContext handle the token limits exceeded error
- In
ActiveContext::Embeddings, introduce an error that can be propagated to other ActiveContext classes. This should encapsulate theVertexAi::Embeddings::Textimplementation from the rest of ActiveContext, since we want to be able to swap this out for another model in the future.- proposed error name:
BatchTooLarge - further implementation details:
ActiveContext::Embeddingsshould catch theVertexAi::Embeddings::Text::TokenLimitsExceededand raise it asActiveContext::Embeddings::BatchTooLarge
- proposed error name:
- In
ActiveContext::Preprocessors::Embeddingsimplement a back-off to reduce the batch size (only for the current call) if it encounters anActiveContext::Embeddings::BatchTooLargeerror