Skip to content
Snippets Groups Projects

Introduce DR mode for CI queueing [RUN ALL RSPEC] [RUN AS-IF-FOSS]

Merged Kamil Trzciński requested to merge ci-queueing-defcon into master
All threads resolved!

What does this MR do?

There's a known set of deficiencies in a CI queueing that might impact our ability to process builds.

As a way to temporarily mitigate them we allow to disable them for a very limited period.

Purpose

  • Take into account that these queries are executed against replicas, always
  • The usage of these queries is long queues or 500 returned by jobs/request

Related to

Impact

This introduces two defcon modes (that can be used separately from each other) that reduces complexity of this query:

explain analyze
SELECT "ci_builds".* FROM "ci_builds"
  INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
  LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
  LEFT JOIN (
    SELECT "ci_builds"."project_id", count(*) AS running_builds
    FROM "ci_builds"
    WHERE "ci_builds"."type" = 'Ci::Build' AND ("ci_builds"."status" IN ('running')) AND "ci_builds"."runner_id" IN (SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."runner_type" = 1) GROUP BY "ci_builds"."project_id"
  ) AS project_builds ON ci_builds.project_id=project_builds.project_id
WHERE
  ("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
  "projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
  (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
  "ci_builds"."type" = 'Ci::Build' AND
  ("projects"."visibility_level" = 20 OR (EXISTS (WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE (namespaces.id = projects.namespace_id)) UNION (SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT 1 FROM "base_and_ancestors" AS "namespaces" LEFT JOIN namespace_statistics ON namespace_statistics.namespace_id = namespaces.id WHERE "namespaces"."parent_id" IS NULL AND (COALESCE(namespaces.shared_runners_minutes_limit, 400, 0) = 0 OR COALESCE(namespace_statistics.shared_runners_seconds, 0) < COALESCE((namespaces.shared_runners_minutes_limit + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), (400 + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), 0) * 60))))
ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_builds.id ASC;

(This misses tags matching, but this is required in all cases).

DEFCON: disable quota

explain analyze
SELECT "ci_builds".* FROM "ci_builds"
  INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
  LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
  LEFT JOIN (
    SELECT "ci_builds"."project_id", count(*) AS running_builds
    FROM "ci_builds"
    WHERE "ci_builds"."type" = 'Ci::Build' AND ("ci_builds"."status" IN ('running')) AND "ci_builds"."runner_id" IN (SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."runner_type" = 1) GROUP BY "ci_builds"."project_id"
  ) AS project_builds ON ci_builds.project_id=project_builds.project_id
WHERE
  ("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
  "projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
  (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
  "ci_builds"."type" = 'Ci::Build'
ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_builds.id ASC;

DEFCON: disable fair scheduling

explain analyze
SELECT "ci_builds".* FROM "ci_builds"
  INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
  LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
WHERE
  ("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
  "projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
  (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
  "ci_builds"."type" = 'Ci::Build' AND
  ("projects"."visibility_level" = 20 OR (EXISTS (WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE (namespaces.id = projects.namespace_id)) UNION (SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT 1 FROM "base_and_ancestors" AS "namespaces" LEFT JOIN namespace_statistics ON namespace_statistics.namespace_id = namespaces.id WHERE "namespaces"."parent_id" IS NULL AND (COALESCE(namespaces.shared_runners_minutes_limit, 400, 0) = 0 OR COALESCE(namespace_statistics.shared_runners_seconds, 0) < COALESCE((namespaces.shared_runners_minutes_limit + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), (400 + COALESCE(namespaces.extra_shared_runners_minutes_limit, 0)), 0) * 60))))
ORDER BY ci_builds.id ASC;

DEFCON: disable quota and fair scheduling

explain analyze
SELECT "ci_builds".* FROM "ci_builds"
  INNER JOIN "projects" ON "projects"."id" = "ci_builds"."project_id"
  LEFT JOIN project_features ON ci_builds.project_id = project_features.project_id
WHERE
  ("ci_builds"."status" IN ('pending')) AND "ci_builds"."runner_id" IS NULL AND
  "projects"."shared_runners_enabled" = TRUE AND "projects"."pending_delete" = FALSE AND
  (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0) AND
  "ci_builds"."type" = 'Ci::Build'
ORDER BY ci_builds.id ASC;

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Kamil Trzciński

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Grzegorz Bizon
  • Grzegorz Bizon removed review request for @grzesiek

    removed review request for @grzesiek

  • Kamil Trzciński added 2943 commits

    added 2943 commits

    Compare with previous version

  • added 1 commit

    • a36d194b - Introduce DR mode for CI queueing

    Compare with previous version

  • Andrew Newdigate approved this merge request

    approved this merge request

  • @ayufan this is great :heart: thank you. When this is delivered, we need to make sure that the infrastructure team are aware of it, when to use it, and how to enable it, probably through updates to the runbooks documentation.

    cc @brentnewton @jarv @dawsmith

  • changed milestone to %13.12

  • Kamil Trzciński requested review from @grzesiek

    requested review from @grzesiek

  • Kamil Trzciński removed review request for @grzesiek and @andrewn

    removed review request for @grzesiek and @andrewn

  • Kamil Trzciński added 5990 commits

    added 5990 commits

    Compare with previous version

  • Kamil Trzciński resolved all threads

    resolved all threads

  • Kamil Trzciński requested review from @grzesiek

    requested review from @grzesiek

  • Kamil Trzciński requested review from @mkaeppler

    requested review from @mkaeppler

  • Matthias Käppler
  • Matthias Käppler approved this merge request

    approved this merge request

  • Matthias Käppler resolved all threads

    resolved all threads

  • Matthias Käppler
  • Kamil Trzciński resolved all threads

    resolved all threads

  • Kamil Trzciński added 296 commits

    added 296 commits

    Compare with previous version

  • Kamil Trzciński resolved all threads

    resolved all threads

  • Matthias Käppler enabled an automatic merge when the pipeline for 61f38205 succeeds

    enabled an automatic merge when the pipeline for 61f38205 succeeds

  • mentioned in commit 0ee2d125

  • added workflowcanary label and removed workflowstaging label

  • added workflowproduction label and removed workflowcanary label

  • Please register or sign in to reply
    Loading