Skip to content

Update histogram buckets for Banzai cacheless_render metrics

John Hope requested to merge record_banzai_rendering_timings_as_histogram into master

What does this MR do and why?

Full context is in https://gitlab.com/gitlab-org/plan/-/issues/369#note_638946301

gitlab-com/runbooks!3963 (merged) added 95th and 99th percentiles to the Banzai cacheless rendering durations chart. However, the chart is inaccurate due to the bucket sizes used to record the duration metric.

image

This MR changes the measurement from Gitlab::Metrics.measure to Gitlab::Metrics.histogram instead, using bucket sizes calculated roughly from Kibana. The current spread of buckets is [0.001, 0.01, 0.1, 1].

This change adds more granularity in the 0.01-1s range and a higher maximum to improve overall accuracy: [0.01, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10.0, 50, 100]

These are based on measurements from Kibana. We don't have measurements of banzai rendering time in structured logs from what I can see but duration_s is a decent substitute, given that duration_db_s tends to be low as a proportion. Here are some percentile charts:

While the 100th percentile (i.e. the max) is wildly variable from 5s to 50+s, the 99th is fairly stable and tops out about 2s.

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

  1. Visit any markdown-rendered item locally (issues descriptions and comments, for example).
  2. Check /-/metrics for the existence of gitlab_banzai_cacheless_render_real_duration_seconds_* buckets,
  3. Observe changes in the count for each bucket and that buckets match the sizes specified.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by John Hope

Merge request reports