Skip to content

maintenance: Introduce rate limiting for `OptimizeRepo`

Patrick Steinhardt requested to merge pks-maintenance-resumable-walk into master

With the current implementation of OptimizeRepo, we're trying to do as much work as possible in the timeframe dictated by the maintenance schedule. This has proven to be problematic even in times of reduced load on deployments because we were essentially DoS'ing ourselves with so many requests that it caused alerts to trigger. On staging systems, we see more than 200 calls per second for the OptimizeRepository RPC. Even if those jobs don't need to do any heavy repacking, it does cause heavy read-load on Gitaly nodes just to determine that nothing needs to be done.

As a first iteration towards betterment, this commit introduces rate limiting to the maintenance job. Instead of going as quickly as we can, we limit requests per second to 1.

Of course this means that we're now able to optimize less repositories than we previously did. But this is still much better than driving a DoS against ourselves and waking up SREs on weekends. Depending on the typical load we'll see with this rate limiting, we may be able to enable the maintenance task 24/7.

Merge request reports