Reduce batch size for text-embedding-005 requests

What does this MR do and why?

Batched embeddings generation requests for text-embedding-005 are running into a non-trivial volume of 4xx errors around token exceeded limits:

Unable to submit request because the input token count is 21483 but the model supports up to 20000 on the text-embedding-005

While these errors are handled and retried in Rails with a lower batch size, the logged errors can cause confusion during investigation of other errors. In this MR, we pre-emptively reduce the batch size to reduce the number of errors.

References

Screenshots or screen recordings

N/A

How to set up and validate locally

N/A

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #590730 (closed)

Edited by Pam Artiaga

Merge request reports

Loading