Determine Risk for Post Deployment/Background Migrations in relation to Release Preparation

Problem Statement

We need clarification on where we may determine a breakpoint for when migrations are running and when said migration is safe for release.

Today

We have 3 types of database migrations:

Regular migrations - usually schema changes or data modifications that are quick
Post Deployment migrations - executed after a deployment due to application requirements
Background migrations - Long running determined by the data and method of being modified

Migration Style Guide

Risk Assessment

Currently we only validate that the post deployment migrations have been executed, we do not validate that migrations are complete. This means if a long running migration were to be running at the time that Release Managers begin release procedures, a migration may be active. The primary question here is, is there risk in this or perhaps another statement what level of risk are we subject to?

Questions

GitLab has a giant database. It would be very common for large migrations of data to take a long time. If checking that a PDM runs, we'll know if we harm the database as we'll end up causing some sort of incident. Since we do not wait for lengthy background migrations to complete, I wonder if we are missing potential failure scenarios that our self managed users would be subject too,

Exit Criterion

Assess risk - reach out to dev teams as necessary to help us determine an answer
If we want to wait for the PDM to complete, we'll have a bit of work to do, create the necessary issues to ensure that we check the PDM up to the selected sha for release and validate that all migrations have completed.
If we deem this risk low, document, and close

Edited Apr 15, 2024 by John Skarbek