Skip to content

runway: Reduce CPU requests to 1, concurrency to 40

Igor requested to merge tune into main

We saw in https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/375#note_1690418762 that code suggestions is concurrency limited by GIL. We can't use more than one CPU, so we don't need to request more than one.

We're also still seeing some instability, one of the symptoms being an instance with a high concurrent request count (default is 80). By reducing concurrency we can spread eggs across more baskets.

We previously increased max instance count to 200, and we've recently only seen actual instance count maxing out at 25. So we should have plenty of headroom in terms of max instances.

Edited by Igor

Merge request reports