Slow ci runners query at /api/:version/runners
There is a slow recursive CI runners query that is dominating the total database time spent per recursive query.
Fingerprint: 5e3534a257ca56bf
SELECT "ci_runners".* FROM ((SELECT ci_runners.* FROM "ci_runner_projects" INNER JOIN "ci_runners" ON "ci_runners"."id" = "ci_runner_projects"."runner_id" WHERE "ci_runner_projects"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id" WHERE "project_authorizations"."user_id" = $1 AND (project_authorizations.access_level >= $2)))
UNION
(SELECT ci_runners.* FROM "ci_runner_namespaces" INNER JOIN "ci_runners" ON "ci_runners"."id" = "ci_runner_namespaces"."runner_id" WHERE "ci_runner_namespaces"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" INNER JOIN "members" ON "namespaces"."id" = "members"."source_id" WHERE "members"."type" = $3 AND "members"."source_type" = $4 AND "namespaces"."type" = $5 AND "members"."user_id" = $6 AND "members"."requested_at" IS NULL AND (access_level >= $7) AND "members"."access_level" = $8)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = $9 AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces"))) ci_runners WHERE "ci_runners"."id" NOT IN (SELECT "ci_runners"."id" FROM ((SELECT ci_runners.* FROM "ci_runner_projects" INNER JOIN "ci_runners" ON "ci_runners"."id" = "ci_runner_projects"."runner_id" WHERE "ci_runner_projects"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id" WHERE "project_authorizations"."user_id" = $10 AND (project_authorizations.access_level >= $11)))
UNION
(SELECT ci_runners.* FROM "ci_runner_namespaces" INNER JOIN "ci_runners" ON "ci_runners"."id" = "ci_runner_namespaces"."runner_id" WHERE "ci_runner_namespaces"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" INNER JOIN "members" ON "namespaces"."id" = "members"."source_id" WHERE "members"."type" = $12 AND "members"."source_type" = $13 AND "namespaces"."type" = $14 AND "members"."user_id" = $15 AND "members"."requested_at" IS NULL AND (access_level >= $16) AND "members"."access_level" = $17)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = $18 AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces"))) ci_runners WHERE (contacted_at > $19)) ORDER BY "ci_runners"."id" ASC LIMIT $20 OFFSET $21
This query (by fingerprint) is initiated by the gitlab
user hitting the GET /api/:version/runners
endpoint. After some testing I found I could reproduce the query with GET parameters of ?status=offline
.
There is a note in the CI::Runner
model:
62 scope :online, -> { where('contacted_at > ?', online_contact_time_deadline) }
63 scope :recent, -> { where('ci_runners.created_at > :date OR ci_runners.contacted_at > :date', date: 3.months.ago) }
64 # The following query using negation is cheaper than using `contacted_at <= ?`
65 # because there are less runners online than have been created. The
66 # resulting query is quickly finding online ones and then uses the regular
67 # indexed search and rejects the ones that are in the previous set. If we
68 # did `contacted_at <= ?` the query would effectively have to do a seq
69 # scan.
70 scope :offline, -> { where.not(id: online) }
The second most expensive query looks to be directly related wrapping the first query in a COUNT(*)
.
Fingerprint: 68ff6becd1577b9e
SELECT COUNT(*) FROM (SELECT $1 AS one FROM ((SELECT ci_runners.* FROM "ci_runner_projects" INNER JOIN "ci_runners" ON "ci_runners"."id" = "ci_runner_projects"."runner_id" WHERE "ci_runner_projects"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id" WHERE "project_authorizations"."user_id" = $2 AND (project_authorizations.access_level >= $3)))
UNION
(SELECT ci_runners.* FROM "ci_runner_namespaces" INNER JOIN "ci_runners" ON "ci_runners"."id" = "ci_runner_namespaces"."runner_id" WHERE "ci_runner_namespaces"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" INNER JOIN "members" ON "namespaces"."id" = "members"."source_id" WHERE "members"."type" = $4 AND "members"."source_type" = $5 AND "namespaces"."type" = $6 AND "members"."user_id" = $7 AND "members"."requested_at" IS NULL AND (access_level >= $8) AND "members"."access_level" = $9)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = $10 AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces"))) ci_runners WHERE "ci_runners"."id" NOT IN (SELECT "ci_runners"."id" FROM ((SELECT ci_runners.* FROM "ci_runner_projects" INNER JOIN "ci_runners" ON "ci_runners"."id" = "ci_runner_projects"."runner_id" WHERE "ci_runner_projects"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id" WHERE "project_authorizations"."user_id" = $11 AND (project_authorizations.access_level >= $12)))
UNION
(SELECT ci_runners.* FROM "ci_runner_namespaces" INNER JOIN "ci_runners" ON "ci_runners"."id" = "ci_runner_namespaces"."runner_id" WHERE "ci_runner_namespaces"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" INNER JOIN "members" ON "namespaces"."id" = "members"."source_id" WHERE "members"."type" = $13 AND "members"."source_type" = $14 AND "namespaces"."type" = $15 AND "members"."user_id" = $16 AND "members"."requested_at" IS NULL AND (access_level >= $17) AND "members"."access_level" = $18)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = $19 AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces"))) ci_runners WHERE (contacted_at > $20)) LIMIT $21) subquery_for_count
In production this query is often operating 30 minutes out of every hour.
With an average of 4 seconds per query.
When I dig a bit I can see one specific user is responsible for many of these requests.
There now exists indexes on contacted_at that probably obsolete the comment about the offline
scope. We can probably just switch to contacted_at <= ?
.