Geo Test Audit: Brainstorm Topics
These are brainstorm ideas meant to help identify patterns or groupings that can be explored further. This issue was created for transparency but may not have great readability to others.
I will close this issue when the ideas here are moved to a document or follow-up issues.
Information Sources Reviewed
- [ ] Administrator docs: Geo replication
- [ ] Engineering handbook: Geo and Disaster Recovery
- [ ] EE Issues with Geo label
- [ ] Other geo-related planning docs
Use Cases
Note: some use cases are implemented manually (currently, may become automated later)
DISASTER RECOVERY: discussion moved to its own issue
PLANNED FAILOVER (doc)
HIGH AVAILABILITY Specific considerations for this?
Clarification: there is Geo HA and "GitLab HA". Geo HA docs reference GitLab HA
Maintaining a Geo Cluster
SETUP
- creating secondary nodes --> there are system check rake tasks
- LDAP
- Object Storage
- hashed storage
- selective synchronization: files and repos only. Enhancements from Q1
- Docker registry replication
- no downtime (no db migrations)
- nodes temporarily have different gitlab versions
DATABASE MIGRATIONS
Performance Testing
Load testing
WHAT IF PRIMARY/SECONDARY CONNECTION LOST?
- check that no events lost
- sidekiq job queues and syncing services
Data Management
DATA TRANSFER MODES
TYPES OF DATA TRANSFERRED
Existing E2E tests focus on
- bidirectional transfer of repository and LFS data (via both http and ssh)
- primary-to-secondary replication of project/database data (via postgresql streaming replication)
Other/Random
When new features are created do they work on secondary nodes?
USER AUTHENTICATION, PERMISSIONS INPUTS, OUTPUTS, STORAGE