Determining thresholds for database duration limits in Sidekiq throttling for gitlab.com
Based on https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/3818, we need to determine limits for db_main_duration_s
and db_ci_duration_s
across Sidekiq workers. We may need a default limit and custom limits for groups of workers with higher than average db durations.
See gitlab-org/gitlab!168193 (closed)
Plan
Roll out suggested limits and top 10 db duration workersTrack behaviour over 1-2 weeks, taking special note of behaviour around incidents (if any)Adjust limits and list of high database usage workers
Update: Instead of predetermining high DB duration workers, we'll use urgency based limits instead
Suggested limits
For the first iteration, lets proceed with the following levels. Since we are not actually throttling, we could adjust it accordingly as we observe the application rate limit dashboard.
main/ci/sec database
- 20,000 db seconds / minute as the default
- 100,000 db seconds / minute for urgent workers
The values were derived from an observation that incident level usage has always surpassed 100k, and no worker under normal cases would exceed 100k (except for a very occasional spike) #3859 (comment 2547681794)