Reduce batch size for text-embedding-005 requests (!224001) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

Batched embeddings generation requests for text-embedding-005 are running into a non-trivial volume of 4xx errors around token exceeded limits:

Unable to submit request because the input token count is 21483 but the model supports up to 20000 on the text-embedding-005

While these errors are handled and retried in Rails with a lower batch size, the logged errors can cause confusion during investigation of other errors. In this MR, we pre-emptively reduce the batch size to reduce the number of errors.

References

Investigation and discussion: [ActiveContext Code] Token count exceeding limit (gitlab-org#20977 - closed)

Screenshots or screen recordings

N/A

How to set up and validate locally

N/A

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #590730 (closed)

Edited Feb 19, 2026 by Pam Artiaga

Reduce batch size for text-embedding-005 requests

What does this MR do and why?

References

Screenshots or screen recordings

How to set up and validate locally

MR acceptance checklist

Merge request reports