Large random latency spikes in two database queries on GitLab.com

In gitlab-com/gl-infra/production#8376 (closed), we identified large (and apparently random) latency spikes in two database queries:

  • repository_find_by_path
  • repository_find_manifest_by_tag_name

Vacuuming the related tables in gitlab-com/gl-infra/production#8376 (comment 1275492609) made the alert go away and therefore the production incident was resolved. However, looking at metrics now (source), we can see these two are still the biggest apdex spenders, and that is hurting our error budget:

image

The image above shows the effect on apdex from one of the random recurring latency spikes for these two queries.