Create ability to restart puma on Kubernetes deployed services

The first iteration to solve this problem was solved in https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14987, that resulted in this Runbook MR

As a second iteration, automating this process is the next natural step.

Problem Statement

During incident production#5171 (closed), it was identified that in order to quickly mitigate the situation, we needed to HUP the Puma processes running across our entire infrastructure. Currently this is handled very manually for anything in Kubernetes. Example command:

kubect get deploy -n gitlab -l app=webservice | awk '{print $1}' | xargs -I {} kubectl rollout restart -n gitlab deploy {}

However, the above, needs to be executed across all clusters, today there are 4 of them, across 2 namespaces, gitlab (running the main stage) and gitlab-cny (running the Canary stage). And we also need to target two differing labels, app=webservice for all client facing services, as well as app=sidekiq for all sidekiq deployments.

The above is very error prone, not documented, does a little too much because NAME is returned as a result and no such deployment for that exists, and for persons with less experience with Kubernetes, may be uncomfortable.

Solution

  • Come up with a solution where a similar command is run, NOT from a user workstation
  • Document said solution

Milestones

  • Solution is agreed upon
  • Solution is built
  • Solution is documented
Edited by Michele Bursi