Corrective Action: Improve safety around Terraform destroy operations (disks)
## Summary We should have more safety when Terraform destroy operations are triggered by accident. An apply should fail if multiple critical resources containing data (disks) are being deleted. --- Suggested by @ahmadsherif during https://gitlab.com/gitlab-com/gl-infra/production/-/issues/15997 after an accidental deletion of three-recently created gitaly nodes. Luckily we were able to recover most of the data relatively quickly, but with a <30m data loss on each. ## Related Incident(s) <!-- Note the originating incident(s) and link known related incidents/other issues. The relation will happen automatically if you are creating this issue from an incident, if this isn't done already please uncomment the following line: --> Originating issue(s): gitlab-com/gl-infra/production#15997 ## Desired Outcome/Acceptance Criteria <!-- How will you know that this issue is complete? If you have any initial thoughts on implementation details (e.g. what to do or not do, gotchas, edge cases etc.), please share them while they are fresh in your mind. --> - [ ] Are we able to set this lifecycle setting for the data disk component? - [ ] If yes, what are the possible caveats? e.g. unable to legitimately delete data disks via CI when desired? - [ ] Decide on path forward based on the findings to the previous question. - [ ] Update the lifecycle in Terraform. ## Associated Services <!-- Apply the appropriate services associated with this corrective action if applicable. ~"Service::Terraform" --> ## Corrective Action Issue Checklist * [x] Link the incident(s) this corrective action arose from * [x] Give context for what problem this corrective action is trying to prevent re-occurring * [x] Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') * [x] Assign a [priority](https://about.gitlab.com/handbook/engineering/infrastructure/team/reliability/issues.html#issue-priority) (this will default to 'Reliability::P4' but should match the severity of the related incident)
issue