Investigate long duration between PG Upgrade and System Ready
During our validation of upgrading a highly availably cluster from PostgreSQL 12 to PostgreSQL 13, we found that multiple engineers experienced what feels like a long duration between when the upgrade reports complete and when the GitLab instance is actually up and running.
In that interim, the cluster is reporting 500
errors and appears broken - but it does resolve itself over time.
Deliverables
- Closely watch a database upgrade and gather logs, discover why the upgrade process is so lengthy
- Evaluate the root causes and:
- Isolate the smallest possible scope to address each potential cause
- Open an issue for each identified scope with well defined deliverables