Corrective action: Patroni disk snapshot concurrency
Summary
In incident production#6999 (closed), disk snapshot failed due to GCP API Quota rate-limiting.
Related Incident(s)
Originating issue(s): production#6999 (closed)
Desired Outcome/Acceptance Criteria
Spread out the disk snapshots across different instances to avoid API rate-limiting.
- Role: https://gitlab.com/gitlab-com/gl-infra/chef-repo/blob/71071fb811160fb91129a2c53930277675428cbb/roles/gprd-base-db-patroni-ci-backup-replica.json
- Script: https://ops.gitlab.net/gitlab-cookbooks/gitlab-patroni/-/blob/master/templates/default/gcs-snapshot.sh.erb
Associated Services
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose out of -
Give context for what problem this corrective action is trying to prevent from re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
Assign a priority (this will default to 'priority::4')
Edited by Filipe Santos