Introduce DR mode for CI queueing [RUN ALL RSPEC] [RUN AS-IF-FOSS]
What does this MR do?
There's a known set of deficiencies in a CI queueing that might impact our ability to process builds.
As a way to temporarily mitigate them we allow to disable them for a very limited period.
Purpose
- Take into account that these queries are executed against replicas, always
- The usage of these queries is long queues or 500 returned by
jobs/request
Related to
Impact
This introduces two defcon modes (that can be used separately from each other) that reduces complexity of this query:
explain analyze
SELECT "ci_builds".* FROM "ci_builds"
INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
LEFT JOIN (
SELECT "ci_builds"."project_id", count(*) AS running_builds
FROM "ci_builds"
WHERE "ci_builds"."type" = 'Ci::Build' AND ("ci_builds"."status" IN ('running')) AND "ci_builds"."runner_id" IN (SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."runner_type" = 1) GROUP BY "ci_builds"."project_id"
) AS project_builds ON ci_builds.project_id=project_builds.project_id
WHERE
("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
"projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
(project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
"ci_builds"."type" = 'Ci::Build' AND
("projects"."visibility_level" = 20 OR (EXISTS (WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE (namespaces.id = projects.namespace_id)) UNION (SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT 1 FROM "base_and_ancestors" AS "namespaces" LEFT JOIN namespace_statistics ON namespace_statistics.namespace_id = namespaces.id WHERE "namespaces"."parent_id" IS NULL AND (COALESCE(namespaces.shared_runners_minutes_limit, 400, 0) = 0 OR COALESCE(namespace_statistics.shared_runners_seconds, 0) < COALESCE((namespaces.shared_runners_minutes_limit + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), (400 + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), 0) * 60))))
ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_builds.id ASC;
(This misses tags matching, but this is required in all cases).
DEFCON: disable quota
explain analyze
SELECT "ci_builds".* FROM "ci_builds"
INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
LEFT JOIN (
SELECT "ci_builds"."project_id", count(*) AS running_builds
FROM "ci_builds"
WHERE "ci_builds"."type" = 'Ci::Build' AND ("ci_builds"."status" IN ('running')) AND "ci_builds"."runner_id" IN (SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."runner_type" = 1) GROUP BY "ci_builds"."project_id"
) AS project_builds ON ci_builds.project_id=project_builds.project_id
WHERE
("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
"projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
(project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
"ci_builds"."type" = 'Ci::Build'
ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_builds.id ASC;
DEFCON: disable fair scheduling
explain analyze
SELECT "ci_builds".* FROM "ci_builds"
INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
WHERE
("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
"projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
(project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
"ci_builds"."type" = 'Ci::Build' AND
("projects"."visibility_level" = 20 OR (EXISTS (WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE (namespaces.id = projects.namespace_id)) UNION (SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT 1 FROM "base_and_ancestors" AS "namespaces" LEFT JOIN namespace_statistics ON namespace_statistics.namespace_id = namespaces.id WHERE "namespaces"."parent_id" IS NULL AND (COALESCE(namespaces.shared_runners_minutes_limit, 400, 0) = 0 OR COALESCE(namespace_statistics.shared_runners_seconds, 0) < COALESCE((namespaces.shared_runners_minutes_limit + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), (400 + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), 0) * 60))))
ORDER BY ci_builds.id ASC;
DEFCON: disable quota and fair scheduling
explain analyze
SELECT "ci_builds".* FROM "ci_builds"
INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
WHERE
("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
"projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
(project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
"ci_builds"."type" = 'Ci::Build'
ORDER BY ci_builds.id ASC;
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because _____.
-
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Edited by Kamil Trzciński (Back 2025-01-01)