Introduce DR mode for CI queueing [RUN ALL RSPEC] [RUN AS-IF-FOSS]

What does this MR do?

There's a known set of deficiencies in a CI queueing that might impact our ability to process builds.

As a way to temporarily mitigate them we allow to disable them for a very limited period.

Purpose

  • Take into account that these queries are executed against replicas, always
  • The usage of these queries is long queues or 500 returned by jobs/request

Related to

Impact

This introduces two defcon modes (that can be used separately from each other) that reduces complexity of this query:

explain analyze
SELECT "ci_builds".* FROM "ci_builds"
  INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
  LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
  LEFT JOIN (
    SELECT "ci_builds"."project_id", count(*) AS running_builds
    FROM "ci_builds"
    WHERE "ci_builds"."type" = 'Ci::Build' AND ("ci_builds"."status" IN ('running')) AND "ci_builds"."runner_id" IN (SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."runner_type" = 1) GROUP BY "ci_builds"."project_id"
  ) AS project_builds ON ci_builds.project_id=project_builds.project_id
WHERE
  ("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
  "projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
  (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
  "ci_builds"."type" = 'Ci::Build' AND
  ("projects"."visibility_level" = 20 OR (EXISTS (WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE (namespaces.id = projects.namespace_id)) UNION (SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT 1 FROM "base_and_ancestors" AS "namespaces" LEFT JOIN namespace_statistics ON namespace_statistics.namespace_id = namespaces.id WHERE "namespaces"."parent_id" IS NULL AND (COALESCE(namespaces.shared_runners_minutes_limit, 400, 0) = 0 OR COALESCE(namespace_statistics.shared_runners_seconds, 0) < COALESCE((namespaces.shared_runners_minutes_limit + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), (400 + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), 0) * 60))))
ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_builds.id ASC;

(This misses tags matching, but this is required in all cases).

DEFCON: disable quota

explain analyze
SELECT "ci_builds".* FROM "ci_builds"
  INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
  LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
  LEFT JOIN (
    SELECT "ci_builds"."project_id", count(*) AS running_builds
    FROM "ci_builds"
    WHERE "ci_builds"."type" = 'Ci::Build' AND ("ci_builds"."status" IN ('running')) AND "ci_builds"."runner_id" IN (SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."runner_type" = 1) GROUP BY "ci_builds"."project_id"
  ) AS project_builds ON ci_builds.project_id=project_builds.project_id
WHERE
  ("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
  "projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
  (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
  "ci_builds"."type" = 'Ci::Build'
ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_builds.id ASC;

DEFCON: disable fair scheduling

explain analyze
SELECT "ci_builds".* FROM "ci_builds"
  INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
  LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
WHERE
  ("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
  "projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
  (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
  "ci_builds"."type" = 'Ci::Build' AND
  ("projects"."visibility_level" = 20 OR (EXISTS (WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE (namespaces.id = projects.namespace_id)) UNION (SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT 1 FROM "base_and_ancestors" AS "namespaces" LEFT JOIN namespace_statistics ON namespace_statistics.namespace_id = namespaces.id WHERE "namespaces"."parent_id" IS NULL AND (COALESCE(namespaces.shared_runners_minutes_limit, 400, 0) = 0 OR COALESCE(namespace_statistics.shared_runners_seconds, 0) < COALESCE((namespaces.shared_runners_minutes_limit + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), (400 + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), 0) * 60))))
ORDER BY ci_builds.id ASC;

DEFCON: disable quota and fair scheduling

explain analyze
SELECT "ci_builds".* FROM "ci_builds"
  INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
  LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
WHERE
  ("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
  "projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
  (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
  "ci_builds"."type" = 'Ci::Build'
ORDER BY ci_builds.id ASC;

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Kamil Trzciński | OoO 2022.07.23 till 2022.08.20