[meta] GitLab Geo (Read-Only secondary servers)

Geo Decisions

  • We support only PostgreSQL
  • Avatar, LFS, builds artifacts, attachments will be solved either by CephFS or any opensource S3 alternative (this will be done after GA release)
  • We are doing a simple hack with attachments and displaying them from primary until above is solved
  • We moved to use SystemHooks for repository sync coordination (from buffered updates notification)
  • Use SystemHooks for any missing coordination despite database replication
    • What doesn't have SystemHooks should implemented as a SystemHook if make sense
    • Advantages: minimal code difference between CE and EE, more people are using SystemHooks than custom mechanism
    • Disadvantage: communication layer costs more (sidekiq job on every push multiplied by amount of secondary servers)
  • We use SafeWebhooks implementation to validate Hooks from primary
  • Authentication in secondary is done by OAuth protocol, authenticating against primary server (for web)
    • For git you can use either username && password (https://) or SSH key (ssh://)
  • When logging off secondary you will be logged of primary as well (Single Sign Out)

Proposal

Complete - [x] Geo: Cannot delete secondary node if it's the only node present (gitlab-org/gitlab-ee#374) - [x] Geo: Improvements and fixes after QA (gitlab-org/gitlab-ee!354) - [x] Geo: Merge requests on Secondary should not check mergeable status (gitlab-org/gitlab-ee!366) - [x] Geo: Benchmark (#560 (closed)) - [x] Wiki page events webhook should include Wiki attributes (gitlab-org/gitlab-ce#17507) - [x] Omnibus tries to create Postgres extension on read only DB: (gitlab-org/gitlab-ee#628) (omnibus-gitlab!829 (merged)) - [x] Geo: The redirect URI included is not valid - OAuth (gitlab-org/gitlab-ee#650) (gitlab-org/gitlab-ee!444) - [x] Omnibus: manage custom SSL certificate (omnibus-gitlab#712 (closed)) - [x] Improve UI for users in a Geo node (gitlab-org/gitlab-ee#640) - [x] Improve `gitlab:env:info` (gitlab-org/gitlab-ee!459) - [x] Geo: Move Wiki Sync to use SystemHooks (#1482 (closed)) - [x] Geo: Documentation improvements for 8.9 (gitlab-org/gitlab-ee!431) (Can wait) - [x] Improve required SSH Keys documentation for Geo (!431 (merged)) - [x] Fix error in admin dashboard when Geo is enabled and current node is nil (#785 (closed)) - [x] Geo: when license doesn't include Geo you can't disable it anymore (#788 (closed)) - [x] Geo: improve project view UI to guide users how to clone/push from Geo secondary node (#789 (closed)) - [x] Geo: Replicate repository creation (#1071 (closed)) - [x] Geo: more documentation improvements for 8.13 (!766 (merged)) - [x] Geo: Display Custom Avatars (user, project and group) in secondary nodes (#1128 (closed)) - [x] Geo: repository is updated but displays old cached data in Web UI (#1129 (closed)) - [x] Geo: Backfill repositories from primary node without using rsync (#1190 (closed)) - [x] Omnibus - Geo: Generate SSH keys for gitlab user (omnibus-gitlab#1680 (closed)) - [x] Database Cache doesn't work as expected for Geo (gitlab-org/gitlab-ee#1217) - [x] Geo will not let you clone from Secondary on 8.13 (gitlab-org/gitlab-ee#1243) - [x] Geo: Improve Repository Sync (gitlab-org/gitlab-ee#1493) - [x] Geo: Backfill stopped working after 8.15.3 (gitlab-org/gitlab-ee#1645) - [x] Geo: Support v4 API for GitLab Geo endpoints (gitlab-org/gitlab-ee!1256)
  • %10.2 GENERAL AVAILABILITY
  • %10.3 PERFORMANCE AND MONITORING FOR GITLAB.COM SCALE
    • Improve Geo Nodes admin screen #3195 (closed)
    • Track rate of download failures with Prometheus metrics #3244 (closed)
    • Support for CI build logs and artifacts #2388 (closed) ~artifacts
    • Manual failover #1921 (closed)
    • Geo: Make it easier to find out why a repository failed to clone #2968 (closed)
    • postgres_fdw support for Geo secondary node omnibus-gitlab#2760 (closed)
    • postgres_fdw support for Geo secondary node #3382 (closed)
    • Increase parallelism of repo sync for cloud migration #3147 (closed)
    • GeoNodeStatus calculates numbers inefficiently (requires postgres_fdw) #3699 (closed)
    • Support for container registry #2870 (closed) ~artifacts
    • Send GitLab version in status page and verify that all versions are the same #2115 (closed) ~"feature proposal"
    • Detect and warn about broken replication slots on the Geo primary #3617 (closed) ~"feature proposal"
    • Remove SSH cloning support from Geo #3891 (closed)
    • Notify administrators when a node fails to sync #1816 ~"feature proposal"
    • Geo Monitoring #727 (closed)
    • Track replication status based on DR tables #2815 (closed)
    • Geo repository sync workers attempts to sync repos on unhealthy shards in non-backfill conditions #3690 (closed) regression
    • Support different object storage zone in secondary (external object storage replication)
    • Enable slow query logs on Geo secondary (Geo testbed)
    • Warn when Geo replication is proceeding over HTTP, rather than HTTPS #3904 (closed)
    • Document Geo HA #3646 (closed)
    • Build testbed with GitLab HA enabled gitlab-com/infrastructure#3082
    • Message when pushing to Geo secondary should be more descriptive #3945 (closed) usability
  • Backlog
    • Geo: Support clustered deployments with chained replication #3448 (closed) ~"feature proposal"
    • GitLab CI should be able to use specific Geo secondary to clone from #3294 (closed) ~"feature proposal"
    • Provide configuration to override Geo SSH sync URL #2744 (closed) ~"feature proposal"
    • Investigate frontend changes for the "Auditor" user, to reuse in Geo #1709 (closed) ~"feature proposal"
    • Add a blank state for the GitLab Geo feature in the Administration panel #1363 (closed) ~"feature proposal"
    • Better Elasticsearch support for GitLab Geo #1186 ~"feature proposal"
    • Geo: Hybrid Synchronization #623 (closed) ~"feature proposal"
    • Support for Git LFS with object storage #415 (closed) ~lfs
    • Allow Geo selective replication to include personal namespaces #3659 ~"feature proposal"
Edited by James Ramsay (ex-GitLab)