[meta] GitLab Geo (Read-Only secondary servers)
Geo Decisions
- We support only PostgreSQL
- Avatar, LFS, builds artifacts, attachments will be solved either by CephFS or any opensource S3 alternative (this will be done after GA release)
- We are doing a simple hack with attachments and displaying them from primary until above is solved
- We moved to use SystemHooks for repository sync coordination (from buffered updates notification)
- Use SystemHooks for any missing coordination despite database replication
- What doesn't have SystemHooks should implemented as a SystemHook if make sense
- Advantages: minimal code difference between CE and EE, more people are using SystemHooks than custom mechanism
- Disadvantage: communication layer costs more (sidekiq job on every push multiplied by amount of secondary servers)
- We use SafeWebhooks implementation to validate Hooks from primary
- Authentication in secondary is done by OAuth protocol, authenticating against primary server (for web)
- For git you can use either username && password (https://) or SSH key (ssh://)
- When logging off secondary you will be logged of primary as well (Single Sign Out)
Proposal
Complete
- [x] Geo: Cannot delete secondary node if it's the only node present (gitlab-org/gitlab-ee#374) - [x] Geo: Improvements and fixes after QA (gitlab-org/gitlab-ee!354) - [x] Geo: Merge requests on Secondary should not check mergeable status (gitlab-org/gitlab-ee!366) - [x] Geo: Benchmark (#560 (closed)) - [x] Wiki page events webhook should include Wiki attributes (gitlab-org/gitlab-ce#17507) - [x] Omnibus tries to create Postgres extension on read only DB: (gitlab-org/gitlab-ee#628) (omnibus-gitlab!829 (merged)) - [x] Geo: The redirect URI included is not valid - OAuth (gitlab-org/gitlab-ee#650) (gitlab-org/gitlab-ee!444) - [x] Omnibus: manage custom SSL certificate (omnibus-gitlab#712 (closed)) - [x] Improve UI for users in a Geo node (gitlab-org/gitlab-ee#640) - [x] Improve `gitlab:env:info` (gitlab-org/gitlab-ee!459) - [x] Geo: Move Wiki Sync to use SystemHooks (#1482 (closed)) - [x] Geo: Documentation improvements for 8.9 (gitlab-org/gitlab-ee!431) (Can wait) - [x] Improve required SSH Keys documentation for Geo (!431 (merged)) - [x] Fix error in admin dashboard when Geo is enabled and current node is nil (#785 (closed)) - [x] Geo: when license doesn't include Geo you can't disable it anymore (#788 (closed)) - [x] Geo: improve project view UI to guide users how to clone/push from Geo secondary node (#789 (closed)) - [x] Geo: Replicate repository creation (#1071 (closed)) - [x] Geo: more documentation improvements for 8.13 (!766 (merged)) - [x] Geo: Display Custom Avatars (user, project and group) in secondary nodes (#1128 (closed)) - [x] Geo: repository is updated but displays old cached data in Web UI (#1129 (closed)) - [x] Geo: Backfill repositories from primary node without using rsync (#1190 (closed)) - [x] Omnibus - Geo: Generate SSH keys for gitlab user (omnibus-gitlab#1680 (closed)) - [x] Database Cache doesn't work as expected for Geo (gitlab-org/gitlab-ee#1217) - [x] Geo will not let you clone from Secondary on 8.13 (gitlab-org/gitlab-ee#1243) - [x] Geo: Improve Repository Sync (gitlab-org/gitlab-ee#1493) - [x] Geo: Backfill stopped working after 8.15.3 (gitlab-org/gitlab-ee#1645) - [x] Geo: Support v4 API for GitLab Geo endpoints (gitlab-org/gitlab-ee!1256)-
%10.2 GENERAL AVAILABILITY
- Improve GitLab Import rake task to work with Hashed Storage and Subgroups gitlab-org/gitlab-ce#36509
- Geo repository sync worker attempts to sync repos on unhealthy shards in non-backfill conditions #3690 (closed)
- Make Geo::RepositorySyncWorker and Geo::FileDownloadDispatchWorker max_capacity configurable #3532 (closed)
- Use HTTPS cloning for Geo #3341 (closed)
- Geo: backfill and log cursor attempt to sync wikis unconditionally #3569 (closed)
- Fix file descriptor leak #3664 (closed)
- Geo: restarting sidekiq doesn't cause BaseSchedulerWorker leases to be returned #3568 (closed)
- Fix geo route whitelisting #3274 (closed)
- Secondaries forget they are #3074 (closed)
- Geo queue not drained #3373 (closed)
- Trimming the Geo event log #3577 (closed)
- API support for retrieving Geo status #3740 (closed)
- Geo secondary help users not waste time on impossible operations #2524 (closed) usability
- Geo secondaries do not handle upload or pages transfers when a project is renamed #3674 (closed)
- Build integration test framework to spin up GitLab Geo on two nodes #3765 (closed)
- Import old attachments into Uploads table gitlab-org/gitlab-ce#29240
- Improve/revise documentation for GA #3831 (closed)
- Improve error recovery of failed repository/download sync #3119 (closed)
- Review Security Architecture #3865 (closed)
- Provide instructions for SSL with PostgreSQL #1745 (closed)
- Document non-standard SSL #2857 (closed)
- Doc to add secondary node to db before starting #3400 (closed)
- Doc what omnibus Geo roles do #2825 (closed)
- Doc: order of installation #3497 (closed)
- Documentation improvements #3831 (closed)
- Workhorse to support Geo over https gitlab-workhorse#149 (closed)
- Allow sync retry on secondaries to be disabled #3810 (closed)
- Sidekiq db pool size should match thread count in Geo #3809 (closed)
- FileDownloadDispatchWorker only enqueued hourly #3771 (closed)
-
%10.3 PERFORMANCE AND MONITORING FOR GITLAB.COM SCALE
- Improve Geo Nodes admin screen #3195 (closed)
- Track rate of download failures with Prometheus metrics #3244 (closed)
- Support for CI build logs and artifacts #2388 (closed) ~artifacts
- Manual failover #1921 (closed)
- Geo: Make it easier to find out why a repository failed to clone #2968 (closed)
- postgres_fdw support for Geo secondary node omnibus-gitlab#2760 (closed)
- postgres_fdw support for Geo secondary node #3382 (closed)
- Increase parallelism of repo sync for cloud migration #3147 (closed)
- GeoNodeStatus calculates numbers inefficiently (requires postgres_fdw) #3699 (closed)
- Support for container registry #2870 (closed) ~artifacts
- Send GitLab version in status page and verify that all versions are the same #2115 (closed) ~"feature proposal"
- Detect and warn about broken replication slots on the Geo primary #3617 (closed) ~"feature proposal"
- Remove SSH cloning support from Geo #3891 (closed)
- Notify administrators when a node fails to sync #1816 ~"feature proposal"
- Geo Monitoring #727 (closed)
- Track replication status based on DR tables #2815 (closed)
- Geo repository sync workers attempts to sync repos on unhealthy shards in non-backfill conditions #3690 (closed) regression
- Support different object storage zone in secondary (external object storage replication)
- Enable slow query logs on Geo secondary (Geo testbed)
- Warn when Geo replication is proceeding over HTTP, rather than HTTPS #3904 (closed)
- Document Geo HA #3646 (closed)
- Build testbed with GitLab HA enabled gitlab-com/infrastructure#3082
- Message when pushing to Geo secondary should be more descriptive #3945 (closed) usability
-
Backlog
- Geo: Support clustered deployments with chained replication #3448 ~"feature proposal"
- GitLab CI should be able to use specific Geo secondary to clone from #3294 (closed) ~"feature proposal"
- Provide configuration to override Geo SSH sync URL #2744 (closed) ~"feature proposal"
- Investigate frontend changes for the "Auditor" user, to reuse in Geo #1709 (closed) ~"feature proposal"
- Add a blank state for the GitLab Geo feature in the Administration panel #1363 (closed) ~"feature proposal"
- Better Elasticsearch support for GitLab Geo #1186 ~"feature proposal"
- Geo: Hybrid Synchronization #623 (closed) ~"feature proposal"
- Support for Git LFS with object storage #415 (closed) ~lfs
- Allow Geo selective replication to include personal namespaces #3659 ~"feature proposal"