Adjust redis cache metrics
Summary
We currently have a single cache histogram for Redis backends. gitlab_cache_operation_duration_seconds
This histogram is labeled by controller
and action
, which leads to quite a lot of cardinality.
Improvements
To reduce the overall metrics load, and improve visibility, we should make the following changes:
-
Remove the controller
andaction
labels from the timing histogram. -
Adjust the redis cache histogram buckets to be more useful. -
Create a new counter, gitlab_cache_operations_total
withcontroller
andaction
. -
Create a new counter, gitlab_cache_operation_seconds_total
controller
andaction
. -
Create a new counter, gitlab_cache_operation_bytes_total
controller
andaction
.
This will cut the metrics load in half by eliminating the per-controller/action buckets, which should allows to increase the number of buckets for Redis timing data.
It will also continue to provide general trends for controller/action requests.
Risks
This is low risk, the current Redis buckets are not well tuned, so we don't get very useful data out of them.
Involved components
gitlab/metrics/subscribers/rails_cache.rb
Optional: Intended side effects
In gitlab.com production, these metrics account for 23% of all ruby service (unicorn/sidekiq) metrics.
Edited by Ben Kochie