Consider Risk mitigation for large data migrations
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem to solve
We need a method to asses the risk of data migrations.
Target audience
Administrators of a GitLab instance.
Further details
Sometimes administrators need to perform administrative work to a GitLab instance that could involve updating multiple database records, or manipulating user's project files on disk.
An example of this is the migration to move project files from legacy storage to hashed storage. In this case, some projects failed migration because they failed validation prior to the move. (0.05% of projects were affected).
For migrations where we are manipulating data, we should assess the risk of the migration to determine if we have taken enough precaution to mitigate that risk.
Proposal
Create a method based on impact and likelihood to assess risk, and formulate a list of potential risk mitigation tools that engineers can use when preparing these migrations.
What does success look like, and how can we measure that?
- Having a simple method to assess risk
- Update migrations issue template (if there is one) so that there is a place for engineers requesting migrations to describe the risk associated with the migration
- Engineers have a list of preventative techniques to apply to high risk migrations
- Administrators that are executing migrations know what is at risk and what may potentially fail
Links / references
/cc @dawsmith @rnienaber
If this is an inappropriate place for this issue, please feel free to move it where it may be best.