Skip to content

Experimentally add latency threshold per Sidekiq worker and apdex scores per worker

Corrective action from www-gitlab-com#4997 (closed)

Generates an apdex score for each worker GitLab.

Related to https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7517

Description

Currently, we perform apdex scoring for jobs based on the queues priorities. However in https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7219 https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7464 https://gitlab.com/gitlab-org/gitlab-ce/issues/64692 et al it has become clear that for all practical purposes, the priority queue on which a job runs cannot be considered to be deterministic.

This means jobs may run on different queues at different times.

For this reason, it makes more sense to have individual SLOS, at least until the point at which our queueing is simplified.

This can then be considered a stop-gap until we can tame our queueing configuration.

Since we have ~100 different jobs, the main problem in implementing a per-worker class SLO is that we have to generate lots of different rules.

For this reason, I've opted to build a script to convert apdex scores for the past week into prometheus recording rules.

Edited by Andrew Newdigate

Merge request reports

Loading