WIP: Graceful docker container stopping
What does this MR do?
This tries to stop docker containers gracefully by using dockers stop method. Docker stop first sends sigterm and then sigkill after a timeout. The timeout could still be made configurable.
It also useful to combine this with !971 (closed) to enable cleanup traps in bash.
Why was this MR needed?
We're using privileged containers which may leave things around on the host. (currently nfs server processes) We also provision test-hardware and turn it off again using bash traps or child reaping docker entrypoints.
Are there points in the code the reviewer needs to double check?
I'm not familiar enough with go contexts. I've used context.TODO()
to also run on canceled jobs.
Does this MR meet the acceptance criteria?
-
Documentation created/updated -
Added tests for this feature/bug -
In case of conflicts with master
- branch was rebased
What are the relevant issue numbers?
Merge request reports
Activity
added Community contribution label
added [Deprecated] Category:Runner devopsverify grouprunner labels
@erushton could you assign this to a reviewer?
added 39 commits
-
c9fe77c9...94782499 - 38 commits from branch
gitlab-org:master
- c18d3cf0 - Docker Executor: use Stop for graceful shutdown
-
c9fe77c9...94782499 - 38 commits from branch
@ajwalker want to give this a first review?
added ReviewP3 label
assigned to @ajwalker
Thank you for this MR @0xQSL!
We're currently working on !2073 (merged) to make waiting for a container easier and more efficient by using Docker's own long polling
WaitContainer
. This removes the need for us to constantly poll to see whether a container has been stopped. As you'll know from working on this, the waiting part can be the most difficult.!2073 (merged) should be in a mergable state soon, and once merged, we'll be looking into implementing the graceful stop. If you're interested in helping with that once it is merged, we can return to this MR. If you're busy, let me know, and I'll reuse some of your solution to have this implemented when we can.
@0xQSL Now that !2073 (merged) has been merged, are you interested in modifying your solution and adding something like a
GracefulKillWait()
to wait.go (that falls back to justKillWait()
after a certain timeout)?Don't worry if you're unable to. Let me know, and I can finish this MR if you're not available.
In either case, thank you for your contribution.
mentioned in merge request !2013 (closed)
added sectionops label
@0xQSL @ajwalker @steveazz This MR looks like it has stalled, though !2013 (closed) was closed in favor of this and !2073 (merged) has made it in as a dependency.
mentioned in issue gitlab-org/quality/triage-reports#1524 (closed)
unassigned @ajwalker
requested review from @ajwalker
mentioned in issue gitlab-org/quality/triage-reports#3024 (closed)
mentioned in issue gitlab-org/quality/triage-reports#3173 (closed)
mentioned in issue #4843 (closed)
mentioned in issue gitlab-org/quality/triage-reports#3418 (closed)
We've updated the target branch to be main since we are removing the master branch, for more information check #27239 (closed)
mentioned in issue gitlab-org/quality/triage-reports#3602 (closed)
mentioned in issue gitlab-org/quality/triage-reports#3790 (closed)
mentioned in issue #6359 (closed)
mentioned in issue gitlab#15603
mentioned in issue gitlab-org/quality/triage-reports#3951 (closed)
mentioned in issue gitlab-org/quality/triage-reports#4139 (closed)
mentioned in issue gitlab-org/quality/triage-reports#4349 (closed)
mentioned in issue gitlab-org/quality/triage-reports#4539 (closed)
mentioned in merge request !3128 (merged)
I'm now closing this in favor of !3128 (merged), which is now part of our 14.4 iteration plan.