Fix restarting of the runner processes
This series fixes a couple of issuer related to watching over and restarting the runner processes.
Fixes #898 (closed)
Merge request reports
Activity
- Resolved by Mark Sapiro
Thanks for this, it has actually been a long mystery for me to fix that I haven't had ideas on why :-)
Please also add a NEWS item under src/mailman/docs/NEWS.rst linking to the open issue for it (#898 (closed)).
added 5 commits
Toggle commit listThere are coverage issues. You need to run
tox -e py37-diffcov
locally (or any Python version >=3.6) and add appropriate tests if possible or mark the uncovered lines with# pragma: nocover
so diffcov passes. These are the issues:Diff Coverage Diff: origin/master...HEAD, staged and unstaged changes ------------- src/mailman/bin/master.py (19.2%): Missing lines 280-283,289-293,300,302-305,312,314-318,415 ------------- Total: 26 lines Missing: 21 lines Coverage: 19% -------------
It does seem that this fixes the issues that caused me to revert !858 (merged) so it looks good for that.
added 6 commits
- 9cf5ba05 - Fix waiting for children.
- daa42119 - Add tests for the master runner's signal handlers.
- fa57b10c - Fix log messages related to exiting and restarting.
- 5024bebb - Do not count explicit restarts against the max_restart limit.
- f337b45e - Gracefully handle a race condition in the signal handlers.
- 0abd19a6 - Always restart runner processes.
Toggle commit listI stopped the pipeline because tests were hung. I'm not certain what's hapenning. When I run
tox -e py37-diffcov
locally, I see multiple failures and errors, but I can't see what they are because tox hangs and I have to interrupt it, however, if I runtox -e py37-diffcov -- mailman.bin.tests.test_master
I get no errors and 1 failure====================================================================== FAIL: test_sighup_handler (mailman.bin.tests.test_master.TestMaster) Invokes the SIGHUP handler. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mark/zzz/mailman/src/mailman/bin/tests/test_master.py", line 178, in test_sighup_handler self.assertEqual(len(list(m._kids)), 1 + 1) AssertionError: 1 != 2 ---------------------------------------------------------------------- Ran 9 tests in 14.170s FAILED (failures=1)
and if I remove the added tests and run
tox -e py37-diffcov
, it passes with 3 expected test failures andTotal: 37 lines Missing: 0 lines Coverage: 100%
It is a total mystery to me what's going on
added 1 commit
- 2479afb7 - Revert "Gracefully handle a race condition in the signal handlers."
marked this merge request as draft from teythoon/mailman@02199db6
added 6 commits
- 4db974ce - Fix the logging paths in the generated test configuration.
- eed68869 - Add tests for the master runner's signal handlers.
- 00d86f1e - Fix log messages related to exiting and restarting.
- e69e4c29 - Do not count explicit restarts against the max_restart limit.
- d81d198a - Gracefully handle a race condition in the signal handlers.
- ac25fffc - Always restart runner processes.
Toggle commit listI suspect it has something to do with the forked processes from unused_pid().
Good call. I opted to
nocover
the trivial exception handlers instead. Injecting a dead pid was a dodgy way to simulate that race condition anyway.I replaced the one racy sleep with inspecting of the log files instead. Works much nicer, and I also ensure that the runners signal handler is run by peeking at the logs.
One odd thing is that even though the runners orderly exit using
sys.exit
when they catch saySIGTERM
, the status signaled to the master process sometimes makesWIFSIGNALED
true, and sometimes makesWIFEXITED
true. I have no explanation for that. To cope with that, I changed the code to no longer considersig_or_exit
.All in all I'm very happy with the patch series now. Please have another look.
@teythoon Thank you for all your work on this. It's much appreciated.
mentioned in commit 2e355573
Thanks for working with me on this patch series and thank you for merging it.
However, I am very sad that the patch series was squashed into one commit.
First, I tend to spend time and effort to structure my changes in a way that transforms the source from one consistent state to another in atomic and semantically related changes (as much as possible, of course), in an effort to present the changes to whoever will read them, during the patch review or much later. There is reasoning in my commit messages, around 1500 characters or so. In the squashed commit there are barely 150 characters.
Second, this misrepresents the work I did. Not only does it misrepresent the effort and care I put into developing it, it also breaks the chain of custody: who can tell whether changes were added or omitted during the squashing? Relatedly, it also stripped all the cryptographic signatures from the commits.
However, I am very sad that the patch series was squashed into one commit.
One of my biggest complaints about git, and something I really miss from bzr, is the notion of a "main line of development". This strongly pushes the git user mentality that squash merges are the best way to maintain main branch hygiene. I had nothing to do with this specific merge, but I probably would have done it the same way. You see this workflow everywhere in git. I long ago gave up trying to fight it. Sigh.
To add to what Barry said, yes, we do squash to commits and that is currently the default way of merging the commits.
Second, this misrepresents the work I did. Not only does it misrepresent the effort and care I put into developing it, it also breaks the chain of custody: who can tell whether changes were added or omitted during the squashing?
The squashed commit has the reference to the pull request you submitted and the repo will retain all the original commits you created. If you want, you can indeed verify the contents of the squashed commit and the pull request.
Documents/mm3/core ‹master*› » git show-ref | grep refs/pullreqs/1094 3dba5ff55715388c3a0b4f04feafc1b9c2ac5cd3 refs/pullreqs/1094 Documents/mm3/core ‹master*› » git co refs/pullreqs/1094
Relatedly, it also stripped all the cryptographic signatures from the commits.
Yes, it did strip it in the squashed commit, because there is no way to retain it. Github actually signs the squashed commit with it's own key, although I question the value in that not knowing how the private key is handled within Github.
But you can verify the signature on the original commits in Mailman's repo (all original commits are retained):
Documents/mm3/core ‹3dba5ff55*› » git verify-commit 3dba5ff55715388c3a0b4f04feafc1b9c2ac5cd3 141 ↵ gpg: Signature made Thu Jan 12 22:04:47 2023 IST gpg: using EDDSA key D1FE45FB978F6B65C4C0B9AA686F55B4AB2B3386 gpg: key 31855247603831FD: no user ID gpg: Total number processed: 1 gpg: Can't check signature: No public key
There is reasoning in my commit messages, around 1500 characters or so. In the squashed commit there are barely 150 characters.
For whatever reason, I used to think that the original commit messages are appended to the squashed commit, but looks like that's not the case. I have updated the Gitlab settings to include commit messages from all the original commits so they aren't hard to find (using what I mentioned above).
Edited by Abhilash RajI understand that it is possible to verify that no changes were made to my patch series by comparing the squashed version with the original, then to verify the signatures on the original commits.
I understand that squashing the commits necessarily invalidates any signatures. I agree that what Github does is of little value, even more so since their OpenPGP implementation produces malformed signatures.
I further agree that including the original commit messages in the squashed commit would be better than what happened here, but the association between message and change would no longer be as obvious.
My feedback is that all of these issues could be avoided by not squashing the commits, and doing either fast-forward merges or regular merges.
(Context: I work on an OpenPGP implementation and we have a project going on related to git-based supply chain security, hence my interest.)