Update histogram buckets for Banzai cacheless_render metrics (!72898) · Merge requests · GitLab.org / GitLab

John Hope requested to merge record_banzai_rendering_timings_as_histogram into master Oct 22, 2021

What does this MR do and why?

Full context is in https://gitlab.com/gitlab-org/plan/-/issues/369#note_638946301

gitlab-com/runbooks!3963 (merged) added 95th and 99th percentiles to the Banzai cacheless rendering durations chart. However, the chart is inaccurate due to the bucket sizes used to record the duration metric.

This MR changes the measurement from Gitlab::Metrics.measure to Gitlab::Metrics.histogram instead, using bucket sizes calculated roughly from Kibana. The current spread of buckets is [0.001, 0.01, 0.1, 1].

This change adds more granularity in the 0.01-1s range and a higher maximum to improve overall accuracy: [0.01, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10.0, 50, 100]

These are based on measurements from Kibana. We don't have measurements of banzai rendering time in structured logs from what I can see but duration_s is a decent substitute, given that duration_db_s tends to be low as a proportion. Here are some percentile charts:

While the 100th percentile (i.e. the max) is wildly variable from 5s to 50+s, the 99th is fairly stable and tops out about 2s.

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Visit any markdown-rendered item locally (issues descriptions and comments, for example).
Check /-/metrics for the existence of gitlab_banzai_cacheless_render_real_duration_seconds_* buckets,
Observe changes in the count for each bucket and that buckets match the sizes specified.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Oct 26, 2021 by John Hope

Update histogram buckets for Banzai cacheless_render metrics

What does this MR do and why?

Screenshots or screen recordings

How to set up and validate locally

MR acceptance checklist

Merge request reports