Skip to content

Determining thresholds for database duration limits in Sidekiq throttling for gitlab.com

Based on https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/3818, we need to determine limits for db_main_duration_s and db_ci_duration_s across Sidekiq workers. We may need a default limit and custom limits for groups of workers with higher than average db durations.

See gitlab-org/gitlab!168193 (closed)

Plan

  1. Roll out suggested limits and top 10 db duration workers
  2. Track behaviour over 1-2 weeks, taking special note of behaviour around incidents (if any)
  3. Adjust limits and list of high database usage workers

Update: Instead of predetermining high DB duration workers, we'll use urgency based limits instead

Suggested limits

For the first iteration, lets proceed with the following levels. Since we are not actually throttling, we could adjust it accordingly as we observe the application rate limit dashboard.

main/ci/sec database

  • 20,000 db seconds / minute as the default
  • 100,000 db seconds / minute for urgent workers

The values were derived from an observation that incident level usage has always surpassed 100k, and no worker under normal cases would exceed 100k (except for a very occasional spike) #3859 (comment 2547681794)

Edited by Marco Gregorius