Support PostgreSQL HA for registry database (Level 4 support)
## Summary This work supports HA database configuration as outlined in [Level 4 support](https://docs.gitlab.com/omnibus/architecture/multiple_database_support/#level-4). At this level, Omnibus GitLab can configure a database server cluster in high-availability (HA) mode for the component. ## Success criteria - [ ] Container registry can be associated to a Patroni cluster (using cluster scope) - [ ] Container registry can function when Patroni leader changes - [ ] The following GitLab instances can be configured: Use 3K reference architecture or amend where needed. 1. [ ] Scenario 1: A single Patroni cluster for all databases serves both Rails and Container Registry. 2. [ ] Scenario 2: Two separate Patroni clusters are provisioned. They both use the same Consul cluster and PgBouncer nodes. Both both Rails and Container Registry function and always track leaders. ## References - See the concept of "_a Consul service per database cluster_ and _a service watch per logical database_" in https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/master/doc/development/architecture/multiple_database_support/_index.md#level-4 - https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/master/doc/development/database_support.md#level-4 ## **Status Summary** <details> <summary> ### **As of September 16, 2025:** </summary> ### **Current Status** * Existing registry users must manually migrate their metadata from object storage to the new database structure with significant downtime because no automations exists for the migration * Large registries may require extended migration timeframes (up to 40 hours in some cases) * This is already a breaking change regardless of the peculiar scenarios like multi-node Omnibus HA deployments. * There are no known automations that cover all installation scenarios ### **Key Challenges** * HA Support Complexity: For multi-node Omnibus HA customers, there's no automatic solution that works for all scenarios * Geo Deployment Conflicts: Geo deployments require separate, non-replicable registry databases at each site. Adding operational complexity that can't be automatically resolved * Timeline constraints: Feature is planned to be on by default in 19.0, but engineering delivery targets FY26::Q4 which is already tight timeline given the GET and Omnibus implementation paths ### **Decision Points** * Need to be pragmatic about what we commit to given timeline constraints * How to reduce impact especially for customers with managed databases * Approach for providing comprehensive guidance for other affected customers See detailed discussions in https://gitlab.com/groups/gitlab-org/-/epics/19215#note_2742152963 and https://gitlab.com/groups/gitlab-org/-/epics/19215#note_2741413113 </details>
epic