A path to enable Blue/Green cluster deployment scenarios
Scope
In order to improve Fault Tolerance and High Availability of our infrastructure we need to have more than one single cluster serving the traffic from one single environment-specific endpoint i.e. https://pre.gitlab.com We also want to treat our clusters as cattle instead of pets, therefore we need to implement disposable infrastructure where we don't upgrade or modify core cluster components like Kubernetes version or Service Mesh engine. Instead, we want to be able to add and remove clusters from the global endpoint on the fly.
Acceptance criteria
-
Multiple clusters are connected to the same data plane and serve environment-specific traffic.
Suggested migration path
Since spinning up a new data plane might take an undefined amount of time and research. I would suggest doing the following now.
-
Replace CloudFlare DNS entry for pre.gitlab.com with CloudFlare loadbalancer. It should be straightforward without any downtime. -
Connect existing PRE-1 cluster to this loadbalancer -
Run the tests and make sure they all pass -
Connect another cluster PRE-2 to this loadbalancer -
Disable PRE-1 cluster for the sake of the experiment -
Run the tests -
Connect both clusters to LB -
Run the tests -
Spin up a brand new cluster with the same configuration as PRE-2 -
Run the tests -
Automate global traffic manipulation scenarios
Suggested traffic manipulation scenarios
- Add a new cluster to the global endpoint
- Remove cluster from the global endpoint
- Shift X% of the traffic from one cluster to another
- Failover to a healthy cluster if some cluster fails the healthcheck
- Blue/Green cluster rollout