Shared Runner Managers are possibly underprovisioned
Shared runner managers need to be scaled up. On one host, for example, tasks spend 1.2s waiting for a CPU for every second that they run. CPU is pinned near 100% for much of the day. Load average 15 is around 2 per core. # Details `node_schedstat_waiting_seconds_total` on `shared-runners-manager-3.gitlab.com` up at 125%. ![image](/uploads/7fd03091d0a13d9b1980149e0cbf23f5/image.png) https://prometheus.gprd.gitlab.net/graph?g0.expr=max%20by%20(fqdn)%20(rate(node_schedstat_waiting_seconds_total%7Bfqdn%3D%22shared-runners-manager-3.gitlab.com%22%2C%20type%3D%22ci-runners%22%7D%5B1h%5D)*100)%0A%0A&g0.tab=0&g0.stacked=0&g0.range_input=2w https://dashboards.gitlab.net/d/ci-runners-main/ci-runners-overview?orgId=1 cc @dawsmith @tmaczukin
issue