Add Redis timing buckets of 1s and 2s
What does this MR do?
Most transactions use far less than a second in Redis, but we see some Sidekiq jobs with p95 timings over half a second, and p99 timings of over a second. (These values are from structured logging, which records exact timings.)
Adding these buckets gives us a better picture in Prometheus, too.
I got these values from https://log.gprd.gitlab.net/goto/efc94043d4992ef617e77b6c43f9f592.
json.queue.keyword: Descending | 50th percentile of json.redis_duration_s | 90th percentile of json.redis_duration_s | 95th percentile of json.redis_duration_s | 99th percentile of json.redis_duration_s |
---|---|---|---|---|
authorized_project_update:authorized_project_update_user_refresh_with_low_urgency | 0.069 | 0.43 | 0.61 | 1.184 |
repository_update_mirror | 0.012 | 0.168 | 0.325 | 0.974 |
authorized_projects | 0.093 | 0.368 | 0.515 | 0.929 |
update_project_statistics | 0.008 | 0.118 | 0.278 | 0.821 |
detect_repository_languages | 0.007 | 0.101 | 0.233 | 0.706 |
post_receive | 0.017 | 0.089 | 0.159 | 0.537 |
jira_importer:jira_import_import_issue | 0.003 | 0.045 | 0.122 | 0.536 |
Although when I checked last week more than one queue was exceeding 1s in the p99 case: gitlab-com/gl-infra/scalability#411 (comment 359678881)