Post Deploy migrations safety indicator rollback
Story
In one of our most recent incidents where we needed to rollback (production#5952 (closed)) we had post-deploy migrations. Because we were carefully rolling items out, we went the route of pausing the post-deploy migrations job until we knew things were safe and we were not going to rollback prior to playing that job. However, during investigation, it was discovered by an Engineer that of all post-deploy migrations at the time were relatively harmless and backward compatible. Meaning we could let the post-deploy migrations job run, without fear of harming the application if we later performed a rollback.
It is currently our policy that if post-deploy migrations are detected, we immediately indicate that rolling backwards is NOT an option.
We have a desire to address post-deploy migrations in their current form &585 (closed), however, I'd like to propose a request that we make it visible in the migration itself, to indicate whether or not a migration is safe to be ignored.
Unrefined Proposal
- During development of a post-deploy migration - add some indicator that a rollback of code will/will-not impact the proposed migration
- This may be a specially crafted method such that can potentially run via a rake task to provide us this desired information
- This may be a specially crafted comment in the migration itself that release tools could parse or iteratively, an RM can use the commented code to determine if rolling back is feasible
- Release-tools can read this indicator and be smarter when notifying us as to whether a given post-deploy migration can be ignored if the need to rollback arises
Reference:
- Confirmation that rollbacks during unsafe post-deploy migrations is certainly ill-advised: gitlab-org/gitlab#345193 (closed)
- Delivery Teams' rollback procedure: https://gitlab.com/gitlab-org/release/docs/-/blob/master/runbooks/rollback-a-deployment.md#1-gather-package-information
- Remove the blocking nature of post-deploy migrations: &585 (closed)