On SIGHUP, parent can live well beyond grace period and stops responding to signals
I don't have a huge amount of information to reproduce, apart from this production incident report on gitlab.com: gitlab-com/gl-infra/production#2452 (closed).
The logs from this gitaly shard should be visible in Kibana.
A short re-summary: We observed a gitaly parent process remain alive for over an hour after gitlab-ctl hup gitaly was issued. It appears as though there is a race condition in the interprocess communication logic that is meant to facilitate gitaly zero-downtime upgrades.
- gitaly-wrapper appeared to be watching the child
-
gitlab-ctl hup gitalyfailed, because the child would not fork until the parent exited
-
- both the parent and child appeared to be successfully serving requests
- The parent did not respond to SIGTERM or SIGINT
- SIGKILL'ing the parent allowed a subsequent
gitlab-ctl hup gitalyto succeed
I know this is a bit sparse, let me know if you need any more info.