Geo Test Audit: Brainstorm Topics

These are brainstorm ideas meant to help identify patterns or groupings that can be explored further. This issue was created for transparency but may not have great readability to others.

I will close this issue when the ideas here are moved to a document or follow-up issues.

Information Sources Reviewed

  • [ ] Administrator docs: Geo replication
  • [ ] Engineering handbook: Geo and Disaster Recovery
  • [ ] EE Issues with Geo label
  • [ ] Other geo-related planning docs

Use Cases

Note: some use cases are implemented manually (currently, may become automated later)

DISASTER RECOVERY: discussion moved to its own issue

PLANNED FAILOVER (doc)

HIGH AVAILABILITY Specific considerations for this?

Clarification: there is Geo HA and "GitLab HA". Geo HA docs reference GitLab HA

Maintaining a Geo Cluster

SETUP

  • creating secondary nodes --> there are system check rake tasks
  • LDAP
  • Object Storage

CONFIGURING

  • hashed storage
  • selective synchronization: files and repos only. Enhancements from Q1
  • Docker registry replication

UPGRADES

  • no downtime (no db migrations)
  • nodes temporarily have different gitlab versions

DATABASE MIGRATIONS

LOAD BALANCING

Performance Testing

Load testing

WHAT IF PRIMARY/SECONDARY CONNECTION LOST?

  • check that no events lost
  • sidekiq job queues and syncing services

Data Management

DATA TRANSFER MODES

TYPES OF DATA TRANSFERRED

Existing E2E tests focus on

  • bidirectional transfer of repository and LFS data (via both http and ssh)
  • primary-to-secondary replication of project/database data (via postgresql streaming replication)

Other/Random

When new features are created do they work on secondary nodes?

USER AUTHENTICATION, PERMISSIONS INPUTS, OUTPUTS, STORAGE

Edited Jul 15, 2019 by Jennifer Louie
Assignee Loading
Time tracking Loading