Skip to content

Investigate long duration between PG Upgrade and System Ready

During our validation of upgrading a highly availably cluster from PostgreSQL 12 to PostgreSQL 13, we found that multiple engineers experienced what feels like a long duration between when the upgrade reports complete and when the GitLab instance is actually up and running.

In that interim, the cluster is reporting 500 errors and appears broken - but it does resolve itself over time.

Deliverables

  1. Closely watch a database upgrade and gather logs, discover why the upgrade process is so lengthy
  2. Evaluate the root causes and:
    1. Isolate the smallest possible scope to address each potential cause
    2. Open an issue for each identified scope with well defined deliverables