Follow-up from "Enable force cancellation of stuck build jobs"
The following discussions from !179137 (merged) should be addressed:
Documentation follow-up:
-
(high priority, no blockers) @jessieay started a discussion: (+2 comments) (Addressed in: !183684 (merged)) Suggestion: we should mention here that this is behind a default disabled feature flag and label it as experimental so that we can change the behavior before releasing it more generally: https://docs.gitlab.com/api/rest/#versioning-and-deprecations
-
@jessieay started a discussion: (+1 comment) (Addressed in: !183684 (merged)) optional :force, type: Boolean, desc: 'Force cancellation for a job with a state of `canceling`', documentation: { example: true } -
@allison.browne started a discussion: (+5 comments) It's surprising to me that we don't have any documentation on canceling a job (I couldn't find it at least).
Is this the right place to document cancelation? A job should be cancelable from more than just the job log.
I was thinking it might be better in
control how jobs runhttps://docs.gitlab.com/ci/jobs/job_control/ although that mostly deals with syntax that changes when a job is picked up.Or maybe we could add it directly to jobs: https://docs.gitlab.com/ci/jobs/
cc. @lyspin, @marcel.amirault
-
@drew started a discussion: This message might need an update pending the outcome of !179137 (comment 2376359007)
Making a note of this in the follow-up so we don't lose track of it
Code clean-up:
-
@jessieay started a discussion: (+1 comment) (Addressed in: !183684 (merged)) Question: do we need to accept all of these for
force_param? Seems like we could be a big more opinionated and require this to betrue(as a string) -
@drew started a discussion: (related to the item above) (Addressed in: !183684 (merged)) Another thing for the follow-up, we should probably mirror the accepted value from
app/controllers/projects/jobs_controller.rbCurrently
def force_param %w[1 t true y yes].include?(params[:force].to_s.downcase) endbut we already have an item about potentially changing that, so I'm going just to link this change back up to that.
-
@jessieay started a discussion: (+1 comment) (Addressed in: !183684 (merged)) Suggestion: instead of iterating through an array, use parameterized specs https://docs.gitlab.com/development/testing_guide/best_practices/#table-based--parameterized-tests
-
@allison.browne started a discussion: (+1 comment) Do we need to have
supports_force_cancel?. We won't see a build in thecancelingstate if the runner was behind. It seems like checking for the canceling state should be sufficient. What do you think? -
(high priority, no blockers) @drew started a discussion: (+2 comments) (Addressed in: !183684 (merged)) I think this makes two executions of the
cancelevent from CommitStatus: ... equivalent toforce_cancel, which it looks like we agreed we don't want.
Permissions clean-up:
-
@jessieay started a discussion: (+2 comments) Suggestion: I would consolidate all of these permission checks behind a new check called
(:force_cancellation, build)(defined in the policy class) so that if we want to change the authorization logic that can be done in one place -
@drew started a discussion: (+2 comments) I don't love this being a parallel definition of the permission; the serializer isn't one of the more obvious places I would look.
Something like making
BuildCancelService#allowed_to_force?a public helper method might help us keep this reliable and less brittle for whoever looks at this in a couple years.
State machine follow-up:
-
@drew started a discussion: (+1 comment) We have
dropdefined above that specifies this exact transition
Runner-Application communication follow-up:
-
(high priority, investigation required before enabling for customers) @allison.browne started a discussion: (+8 comments) Ah, right so
BuildFinishedWorkerwill revoke the token immediately and then it's up the the runner to stop execution if it can no longer communicate with gitlab on the next update.A team member should pull down this code to make sure that's how it's behaving on gdk with a registered current runner before it gets merged.
This is a critical area to get right because if the runner doesn't shut down the
after_scriptwork this easily becomes an abuse vector for .com instance runner billing.@daniel-prause, would you feel comfortable simulating this in your development environment, for us, and posting some screenshots?