Abort pipelines when user is blocked [RUN ALL RSPEC] [RUN AS-IF-FOSS] (!56126) · Merge requests · GitLab.org / GitLab

Matthias Käppler requested to merge 323742-abort-user-pipelines-on-block into master Mar 09, 2021

What does this MR do?

In https://gitlab.com/gitlab-org/gitlab/-/issues/323742 we identified a problem where upon blocking a user, if that user has many pipelines and builds running, then gracefully cancelling those via Ci::CancelUserPipelinesService leads to N+1s and very slow execution.

We think it is best in this case to cancel those in bulk instead, as we did with deleted projects already, which can be done more efficiently. This has the drawback of not triggering AR callbacks. This MR addresses the resulting problems only up to those changes affecting the UI so that pipelines views to not appear stuck. All other domain integrity related issues are filed in this follow-up issue: #324309

Screenshots

After blocking a user:

The review app is deployed here: https://gitlab-review-323742-abo-fiu1qi.gitlab-review.app

Implementation

I decided to slightly generalize the existing service into AbortPipelinesService instead of creating a new service, since really there was no reason for this to only act on project pipelines. It now simply acts on pipelines and the related builds, and the caller can decide how those are scoped (user, project, or otherwise.)

New queries

This MR introduces two new queries and changes one:

New: batch-select + update of user pipelines
New: batch-select + update of pipeline stages
Changed: batch-select + update of related builds, not scoped to cancelable builds only (this was likely an existing bug)

User pipelines

SELECT
    "ci_pipelines"."id"
FROM
    "ci_pipelines"
WHERE
    "ci_pipelines"."user_id" = 4626651
    AND "ci_pipelines"."status" IN ('running', 'waiting_for_resource', 'preparing', 'pending', 'created', 'scheduled')
    AND "ci_pipelines"."id" >= 2
ORDER BY
    "ci_pipelines"."id" ASC
LIMIT 1 OFFSET 1000

https://explain.depesz.com/s/CgAA

Pipelines stages

SELECT "ci_stages"."id" FROM "ci_stages" WHERE "ci_stages"."pipeline_id" IN
  (SELECT "ci_pipelines"."id" FROM "ci_pipelines"
    WHERE "ci_pipelines"."user_id" = 4626651
    AND "ci_pipelines"."status" IN ('running', 'waiting_for_resource', 'preparing', 'pending', 'created', 'scheduled')
    AND "ci_pipelines"."id" >= 1
  )
  AND "ci_stages"."status" IN (2,10,9,1,0,8) ORDER BY "ci_stages"."id" ASC LIMIT 1000

https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2778/commands/8664

Pipeline builds

SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."commit_id" IN
  (SELECT "ci_pipelines"."id" FROM "ci_pipelines"
    WHERE "ci_pipelines"."user_id" = 4626651
    AND "ci_pipelines"."status" IN ('running', 'waiting_for_resource', 'preparing', 'pending', 'created', 'scheduled')
    AND "ci_pipelines"."id" >= 1
  )
  AND "ci_builds"."status" IN ('running', 'waiting_for_resource', 'preparing', 'pending', 'created', 'scheduled')
  ORDER BY "ci_builds"."id" ASC LIMIT 1000

https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/2778/commands/8665

Migrations

I had originally included a migration that added an index on ci_pipelines to make the outer batch query faster, but this work was already happening in parallel in https://gitlab.com/gitlab-org/gitlab/-/issues/301222 (MR: !56314 (merged)). We can ship both as separate improvements. I measured the queries with that index in place.

I also added a new partial index on ci_stages to speed up the inner select on stages by state:

CREATE INDEX index_ci_stages_on_pipeline_id_and_id ON ci_stages USING btree (pipeline_id, id) WHERE (status = ANY (ARRAY[0, 1, 2, 8, 9, 10]));

UP

== 20210316152500 AddIndexCiStagesOnPipelineIdAndId: migrating ================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:ci_stages, [:pipeline_id, :id], {:where=>"status IN (0, 1, 2, 8, 9, 10)", :name=>"index_ci_stages_on_pipeline_id_and_id", :algorithm=>:concurrently})
   -> 0.0061s
-- add_index(:ci_stages, [:pipeline_id, :id], {:where=>"status IN (0, 1, 2, 8, 9, 10)", :name=>"index_ci_stages_on_pipeline_id_and_id", :algorithm=>:concurrently})
   -> 0.0139s
== 20210316152500 AddIndexCiStagesOnPipelineIdAndId: migrated (0.0229s) =======

DOWN

== 20210316152500 AddIndexCiStagesOnPipelineIdAndId: reverting ================
-- transaction_open?()
   -> 0.0000s
-- indexes(:ci_stages)
   -> 0.0078s
-- current_schema()
   -> 0.0003s
== 20210316152500 AddIndexCiStagesOnPipelineIdAndId: reverted (0.0134s) =======

Feature flag

I put this behind the feature flag abort_user_pipelines_on_block scoped to particular users so we can test this first. I was not able to test this locally very well outside of unit tests.

Rollout issue: https://gitlab.com/gitlab-org/gitlab/-/issues/324045

Does this MR meet the acceptance criteria?

Conformity

📋 Does this MR need a changelog?
- I have included a changelog entry.
- I have not included a changelog entry because the change is still behind a feature flag.
[-] Documentation (if required)
Code review guidelines
Merge request performance guidelines
Style guides
[-] Database guides
[-] Separation of EE specific content

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
[-] Tested in all supported browsers
[-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Related to #323742

Edited Mar 22, 2021 by Matthias Käppler

Abort pipelines when user is blocked [RUN ALL RSPEC] [RUN AS-IF-FOSS]