[gprd] Delete the `/usr/local/bin/gcs-snapshot.sh` crontab entry in the `root` crontab on `patroni-v12-10-db-gprd`

Production Change

Change Summary

[gprd] Delete the /usr/local/bin/gcs-snapshot.sh crontab entry in the root crontab on patroni-v12-10-db-gprd.

Fulfills: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14444

Change Details

Services Impacted - ServicePatroni
Change Technician - @nnelson
Change Reviewer - tbd
Time tracking - 15 minutes
Downtime Component - No downtime

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - 1 minute

Set label changein-progress on this issue

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 5 minute

Establish a secure shell session to the trafficless replica node:
```
ssh patroni-v12-10-db-gprd.c.gitlab-production.internal
```
Switch user to root, and edit the root crontab.
```
sudo su - root
crontab -e
```
Delete the entry for /usr/local/bin/gcs-snapshot.sh.

Ensure that the current session is using the root user and delete the FIFO files:

sudo su - root
rm -f /tmp/snapshot-start-backup
rm -f /tmp/snapshot-stop-backup

Ensure that the current session is using the root user, find all running /usr/local/bin/gcs-snapshot.sh processes, review them, and record the output as a comment on this issue.
```
sudo su - root
ps -aux | grep -v 'grep' | grep '/usr/local/bin/gcs-snapshot.sh' > /tmp/to_kill.txt
cat /tmp/to_kill.txt
```

Terminate them all.

cat /tmp/to_kill.txt | awk '{ print $2 }' | xargs kill -9

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - 2 minutes

Switch user to the gitlab-psql user and invoke the gcs-snapshot.sh script to verify that things are working correctly once again.
```
sudo su - gitlab-psql
/usr/local/bin/gcs-snapshot.sh
```
Confirm that there are no errors in the log output from the script.

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - 0 minutes

Apparently, this is a mis-configuration. I suspect this is leftover from some previous attempt which configured the crontab for the root user.

There should be no reason to rollback. If this configuration is canonical according to chef, then it will get rolled-back on its own by the chef-client convergence process, and a different change will have to be made in the cookbook recipe.

Monitoring

Key metrics to observe

Metric: patroni Service Apdex
- Location: https://dashboards.gitlab.net/d/patroni-main/patroni-overview?orgId=1
- What changes to this metric should prompt a rollback: Any sustained (more than 2-5 minutes) reduction in SLI below the 1 hour SLO.

Summary of infrastructure changes

Does this change introduce new compute instances?
- No
Does this change re-size any existing compute instances?
- No
Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?
- No

Changes checklist

Edited Oct 19, 2021 by Nels Nelson