Create ability to restart puma on Kubernetes deployed services
The first iteration to solve this problem was solved in https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14987, that resulted in this Runbook MR
As a second iteration, automating this process is the next natural step.
Problem Statement
During incident production#5171 (closed), it was identified that in order to quickly mitigate the situation, we needed to HUP
the Puma processes running across our entire infrastructure. Currently this is handled very manually for anything in Kubernetes. Example command:
kubect get deploy -n gitlab -l app=webservice | awk '{print $1}' | xargs -I {} kubectl rollout restart -n gitlab deploy {}
However, the above, needs to be executed across all clusters, today there are 4 of them, across 2 namespaces, gitlab
(running the main stage) and gitlab-cny
(running the Canary stage). And we also need to target two differing labels, app=webservice
for all client facing services, as well as app=sidekiq
for all sidekiq deployments.
The above is very error prone, not documented, does a little too much because NAME
is returned as a result and no such deployment for that exists, and for persons with less experience with Kubernetes, may be uncomfortable.
Solution
-
Come up with a solution where a similar command is run, NOT from a user workstation -
Document said solution
Milestones
-
Solution is agreed upon -
Solution is built -
Solution is documented