Automate database snapshotting and sanitization
In order to seed test environments (of any kind), we want to be able to leverage the production GitLab.com database for its size.
In this issue, we automate providing a sanitized dataset that can be used for testing. The aim is to be as close with production as possible (both in size and time).
As a first step, we want to:
- Take a snapshot of the production database
- Define and run process to remove sensitive data from it
- Capture that dataset and use it in the preprod environment in a fully automated fashion.
Depending on the time it takes to sanitize the full database, we may be able to run this process on a daily fashion. This yields a database snapshot for testing that is very close to the production database.
Edited by Andreas Brandl