Gitaly adaptive concurrency limit
DRI: @qmnguyen0711 ## Blueprint The blueprint following [Architecture Evolution workflow](https://about.gitlab.com/handbook/engineering/architecture/workflow/) can be found here: https://docs.gitlab.com/ee/architecture/blueprints/gitaly_adaptive_concurrency_limit/ ## Execution Plan The proposal talks about many things. It's too risky to do everything in one go. Therefore, I would love to make the execution iterative. - ✅ Already done: static concurrency limit - :white_check_mark: Phase 1: Implement adaptive concurrency limit based on cgroup resource usage. - This phase tries to make the concurrency limit float according to resource usage. It lays the foundation for later phases. - It is applied for the pack-objects limiter. Focusing on this limiter brings value sooner. - Applied for per-rpc concurrency limiter. - We should treat Observability more seriously - Phase 2: Add latency measurement into account. - Implement latency measurement following either TCP Vegas or Gradient algorithm. As both algorithms are not easy to comprehend, we may need to spend extra steps for dry-run and validation on production. - In the meantime, integrate the adaptive concurrency limit to per-RPC limiters. - Finally, enable latency measurement. ## Status 2023-09-28 Update: * The resizable semaphore MRs were all merged. The feature flag for rolling it out is also [enabled globally](https://gitlab.com/gitlab-org/gitaly/-/issues/5581) on GitLab.com. It's the infrastructure for the actual adaptiveness support that comes later. * The adaptiveness support for pack-objects limiting ([!6411](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6411)) was merged. Self-managed instances can try this feature out, but we don't want to broadcast it before it's enabled widely on dot com. * The adaptive support for per-rpc limiting is being reviewed ([!6412](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6412) and [!6418](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6418)) Upcoming: * [Calibrate adaptive limiting setting for Gitlab.com](https://gitlab.com/gitlab-org/gitaly/-/issues/5610) * Observability to track feature rolling out on dot com. * Pick some targeted nodes and roll this feature out. ## Status 2023-08-18 Although all the core implementations were done, we had to roll back the integration MR due to an incident: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16187. The root cause was caused by the new resizable semaphore implementation. The limiter doesn't clean up the semaphores properly, thus it leads to goroutine leaking. This epic is blocked until https://gitlab.com/gitlab-org/gitaly/-/issues/5523+ is resolved.
epic