gitlab-monitor scrapes cause replication lag on archive replica
Background:
- We disabled
gitlab-monitor
on the patroni hosts as we suspected the queries were amplifying load on the database - I moved the heavy queries around the
/ci_builds
endpoint to the archive replica with https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/1701
We now observe unsustainable replication lag on the archive replica:
A single scrape takes about 30s and this query is the top offender (27s):
- Query: https://gitlab.com/gitlab-org/gitlab-exporter/blob/master/lib/gitlab_exporter/database/ci_builds.rb#L10
- Plan: https://explain.depesz.com/s/d1A1
There's no direct way of speeding up the query as it is expected to scan a lot of data (about 15GB of buffers per query).
This issue is to track the infra changes.
Edited by Andreas Brandl