Generate constant load in staging
Really good conversation in the incident call for https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/9469 (thanks @jarv and @mwasilewski-gitlab )
- “Why didn’t we pick this up in the redis-cache apdex score on staging?”
- “You can sort of see it in the metrics, but nobody looks at them anyway, because they’re just too noisy”
- “Why are they noisy?”
- “There’s not enough traffic”
- “Then, lets ramp up the traffic constantly in staging, so that we get enough signal to drown out the noise”
This is discussed in the SRE Workbook https://landing.google.com/sre/workbook/chapters/alerting-on-slos/#generating-artificial-traffic
The plan: generate constant traffic on staging. Only once we're handling (say) ~50 RPS will the key metrics become clearer against the background noise.
We could start with just running gitlab-qa in parallel, over and over, discarding the results of the QA runs. Alternative load generation strategies should be considered.
Edited by Andrew Newdigate