Separate replica for analytical workloads
We regularly have people asking for access to the production database to gather some data. Mostly, this is rather analytical with people wanting to interact with the database to figure out what questions to ask in the first place (see gitlab-com/database#135 (moved)). There are a few concerns with this:
- Analytical queries tend to take long to execute and interfere with production workload if run on a production replica used by the site.
- Exploratory analytics (ie repeatedly firing untested queries to figure out which questions to ask) tends to yield queries that take a long time to execute (they tend to lack optimization).
- Access: We want to limit access to production replicas as much as we can. Yet it makes sense to grant access to the database to a wider group to get more insights through exploratory analytics (see above).
The proposal here is to provide a separate replica for this use-case.
Furthermore, the replica is suitable also to act as a source for any ETL loads we might have in the future (for data warehousing). It may also be used for backups if we're confident about the implications of long running queries and possible interference with backups.
We want to make sure this replica does not provide hot standby feedback to other production instances upstream.