Gitaly process stuck on shutdown after a gitlab-ctl reconfigure
After a configuration change made in https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/4594 that triggered a gitlab-ctl reconfigure
on the production fleet, some Gitaly nodes were stuck when trying to shutdown. This resulted in them being unavailable, causing a service outage on the projects hosted on the nodes.
See the production incident 2024-04-08: 500 errors when accessing repositories (gitlab-com/gl-infra/production#17790 - closed) for details.
To restore service to the affected nodes, we needed to restart Gitaly.
Action Items
-
Once we find out the issue restart Gitaly deploys in https://ops.gitlab.net/gitlab-org/release/tools/-/pipeline_schedules by pinging @release-managers
in slack. -
Update chef-repo merge request template to remove the default description.
Edited by Ahmad Sherif