runway: Reduce CPU requests to 1, concurrency to 40 (!496) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

Igor requested to merge tune into main Dec 12, 2023

We saw in https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/375#note_1690418762 that code suggestions is concurrency limited by GIL. We can't use more than one CPU, so we don't need to request more than one.

We're also still seeing some instability, one of the symptoms being an instance with a high concurrent request count (default is 80). By reducing concurrency we can spread eggs across more baskets.

We previously increased max instance count to 200, and we've recently only seen actual instance count maxing out at 25. So we should have plenty of headroom in terms of max instances.

Edited Dec 12, 2023 by Igor

runway: Reduce CPU requests to 1, concurrency to 40

Merge request reports