[meta] Listing all issues related to Jan 31st outage to track their progress
Tracking _all_ the issues that were spawned from or referenced in the Jan 31st outage blog post-mortem: https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/
Mostly doing this for my own benefit, but feel free to update or recommend a better way of tracking this.
Issues and their updates:
- :large_orange_diamond: Removal of users by spam should not hard delete https://gitlab.com/gitlab-org/gitlab-ce/issues/27581
- :white_check_mark: Update PS1 across all hosts to more clearly differentiate between hosts and environments (#1094)
- :white_check_mark: Prometheus monitoring for backups (#1095)
- Now went to zero-loss continuous streaming to S3 bucket and Azure blob using WAL-E (#1152)
- WAL-E implemented per #1152, but no monitoring available yet so #1095 remains open.
- :white_check_mark: Set PostgreSQL's max_connections to a sane value (#1096)
- :white_check_mark: Investigate Point in time recovery & continuous archiving for PostgreSQL (#1097)
- Was closed based on comment https://gitlab.com/gitlab-com/infrastructure/issues/494#note_23009747 (using Wal-E instead of PITR).
- :white_check_mark: Hourly LVM snapshots of the production databases (#1098)
- :white_check_mark: Azure disk snapshots of production databases (#1099)
- Superseded by "Fix Azure snapshots" (#1606)
- #1606 now dependent on "Convert GitLab ARM Hosts to "Managed Disk"" (#1649)
- :white_check_mark: Move staging to the ARM environment (#1100)
- :white_check_mark: Recover production replica(s) (#1101)
- :large_orange_diamond: Automated testing of recovering PostgreSQL database backups (#1102)
- Superseded by Automate restoring a database with Wal-E (#1265)
- :white_check_mark: Improve PostgreSQL replication documentation/runbooks (#1103)
- :white_check_mark: Investigate pgbarman for creating PostgreSQL backups (#1105)
- Closed by decision to use Wal-E (https://gitlab.com/gitlab-com/infrastructure/issues/494#note_23009747)
- :white_check_mark: Investigate using WAL-E as a means of Database Backup and Realtime Replication (#494)
- :white_check_mark: Build Streaming Database Backup (#1152)
- :white_check_mark: Assign an owner for data durability (#1163)
- :white_check_mark: Merge Request: Bundle pgpool-II 3.6.1 (https://gitlab.com/gitlab-org/omnibus-gitlab/merge_requests/1251)
- Closed in favor of "Adding pgbouncer as EE specific dependency" (https://gitlab.com/gitlab-org/omnibus-gitlab/merge_requests/1345)
- :white_check_mark: Connection pooling/load balancing for PostgreSQL (#259)
- Superseded by setting up pgbouncer (#1440)
- :large_orange_diamond: Tool for executing and reverting Rails migrations on staging (#811)
- Lower priority than other tasks, and potential to be superseded by #1504.
- :large_orange_diamond: Disaster recovery for everything that is not the database (#1161)
- This is a meta issue itself, with various linked issues that may take quite a while still.
issue