Reduce batch size for text-embedding-005 requests
What does this MR do and why?
Batched embeddings generation requests for text-embedding-005 are running into a non-trivial volume of 4xx errors around token exceeded limits:
Unable to submit request because the input token count is 21483 but the model supports up to 20000 on the text-embedding-005
While these errors are handled and retried in Rails with a lower batch size, the logged errors can cause confusion during investigation of other errors. In this MR, we pre-emptively reduce the batch size to reduce the number of errors.
References
- Investigation and discussion: [ActiveContext Code] Token count exceeding limit (gitlab-org#20977 - closed)
Screenshots or screen recordings
N/A
How to set up and validate locally
N/A
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #590730 (closed)
Edited by Pam Artiaga