Skip to content

Add queuing queries duration SLI, SLO and alerting

Description

This change adds an SLI for queuing queries duration. In the pending builds table epic we improved performance of these queries almost 100x - from around 5-8s to 50ms - 100ms.

The new design depends on the sequential scan of the ci_pending_builds table and the performance might have a tendency to represent logarithmic characteristics with time if we see a sudden increase of pending builds over the next months / years.

This alert will ensure that we are notified when the performance degrades beyond 1s. Right now the duration oscillates around 50ms:

queuing_queries_2h

Sometimes, typically when there is a long running transaction present, it degrades to 300ms:

queuing_queries_2d

It should not exceed 1s - it might be an indication of a problem we should look into / fix.

/cc @andrewn @reprazent @cheryl.li @fabiopitino

Edited by Grzegorz Bizon

Merge request reports