Follow-up from "Enable force cancellation of stuck build jobs"

The following discussions from !179137 (merged) should be addressed:

(high priority, no blockers) @jessieay started a discussion: (+2 comments) (Addressed in: !183684 (merged))

Suggestion: we should mention here that this is behind a default disabled feature flag and label it as experimental so that we can change the behavior before releasing it more generally: https://docs.gitlab.com/api/rest/#versioning-and-deprecations

@jessieay started a discussion: (+1 comment) (Addressed in: !183684 (merged))

          optional :force, type: Boolean, desc: 'Force cancellation for a job with a state of `canceling`', documentation: { example: true }

@allison.browne started a discussion: (+5 comments)

It's surprising to me that we don't have any documentation on canceling a job (I couldn't find it at least).

Is this the right place to document cancelation? A job should be cancelable from more than just the job log.

I was thinking it might be better in control how jobs run https://docs.gitlab.com/ci/jobs/job_control/ although that mostly deals with syntax that changes when a job is picked up.

Or maybe we could add it directly to jobs: https://docs.gitlab.com/ci/jobs/

cc. @lyspin, @marcel.amirault
@drew started a discussion:

This message might need an update pending the outcome of !179137 (comment 2376359007)

Making a note of this in the follow-up so we don't lose track of it

@jessieay started a discussion: (+1 comment) (Addressed in: !183684 (merged))

Question: do we need to accept all of these for force_param? Seems like we could be a big more opinionated and require this to be true (as a string)
@drew started a discussion: (related to the item above) (Addressed in: !183684 (merged))
Another thing for the follow-up, we should probably mirror the accepted value from app/controllers/projects/jobs_controller.rb

Currently
```
  def force_param
    %w[1 t true y yes].include?(params[:force].to_s.downcase)
  end
```
but we already have an item about potentially changing that, so I'm going just to link this change back up to that.
@jessieay started a discussion: (+1 comment) (Addressed in: !183684 (merged))

Suggestion: instead of iterating through an array, use parameterized specs https://docs.gitlab.com/development/testing_guide/best_practices/#table-based--parameterized-tests
@allison.browne started a discussion: (+1 comment)

Do we need to have supports_force_cancel?. We won't see a build in the canceling state if the runner was behind. It seems like checking for the canceling state should be sufficient. What do you think?
(high priority, no blockers) @drew started a discussion: (+2 comments) (Addressed in: !183684 (merged))

I think this makes two executions of the cancel event from CommitStatus: ... equivalent to force_cancel, which it looks like we agreed we don't want.

@jessieay started a discussion: (+2 comments)

Suggestion: I would consolidate all of these permission checks behind a new check called (:force_cancellation, build) (defined in the policy class) so that if we want to change the authorization logic that can be done in one place
@drew started a discussion: (+2 comments)

I don't love this being a parallel definition of the permission; the serializer isn't one of the more obvious places I would look.

Something like making BuildCancelService#allowed_to_force? a public helper method might help us keep this reliable and less brittle for whoever looks at this in a couple years.

@drew started a discussion: (+1 comment)

We have drop defined above that specifies this exact transition

(high priority, investigation required before enabling for customers) @allison.browne started a discussion: (+8 comments)

Ah, right so BuildFinishedWorker will revoke the token immediately and then it's up the the runner to stop execution if it can no longer communicate with gitlab on the next update.

A team member should pull down this code to make sure that's how it's behaving on gdk with a registered current runner before it gets merged.

This is a critical area to get right because if the runner doesn't shut down the after_script work this easily becomes an abuse vector for .com instance runner billing.

@daniel-prause, would you feel comfortable simulating this in your development environment, for us, and posting some screenshots?

Edited Mar 28, 2025 by Lysanne Pinto