Update histogram buckets for Banzai cacheless_render metrics
What does this MR do and why?
Full context is in https://gitlab.com/gitlab-org/plan/-/issues/369#note_638946301
gitlab-com/runbooks!3963 (merged) added 95th and 99th percentiles to the Banzai cacheless rendering durations chart. However, the chart is inaccurate due to the bucket sizes used to record the duration metric.
This MR changes the measurement from Gitlab::Metrics.measure
to Gitlab::Metrics.histogram
instead, using bucket sizes calculated roughly from Kibana. The current spread of buckets is [0.001, 0.01, 0.1, 1]
.
This change adds more granularity in the 0.01-1s range and a higher maximum to improve overall accuracy: [0.01, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10.0, 50, 100]
These are based on measurements from Kibana. We don't have measurements of banzai rendering time in structured logs from what I can see but duration_s
is a decent substitute, given that duration_db_s
tends to be low as a proportion. Here are some percentile charts:
While the 100th percentile (i.e. the max) is wildly variable from 5s to 50+s, the 99th is fairly stable and tops out about 2s.
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
- Visit any markdown-rendered item locally (issues descriptions and comments, for example).
- Check
/-/metrics
for the existence ofgitlab_banzai_cacheless_render_real_duration_seconds_*
buckets, - Observe changes in the count for each bucket and that buckets match the sizes specified.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.