Discussion: Should we recreate our zonal/regional clusters differently
When we started with Kubernetes, we had a single lonely highly redundant regional cluster. The node pools were spread out across all zones as well. We later created a set of zonal clusters to help deal with the costs associated with network bandwidth charges for our frontend workloads. But we've got a situation where zonal clusters only have 1 API server. Therefore, anytime GKE is performing maintenance to the API, Kubernetes appears down for that cluster. If we switch to regional clusters, we have the ability to limit that percieved downtime and still operate on a fully functional cluster, even during maintenance. But this possibility opens up some new questsions. Should we consider redesigning how we've deployed our clusters in an architectural manner?
Let's utilize this issue to discuss a few options.
Options
Proposal 1
Simply replace our zonal clusters with regional clusters where the node pools stay locked to a given zone
Proposal 2
Consolidate all clusters to one. Remove our zonal clusters in favor of expanding our existing regional cluster with node pools that are locked to a given zone, creating a new deployment that targets a zone similar to how zonal clusters operate today. The thought here is that each namespace effectively marks which zone we operate out of.
-
gitlab
- our existing regional configuration -
gitlab-b
- the same deployment that is located on clustergprd-us-east1-b
-
gitlab-c
- same deal forgprd-us-east1-c
-
gitlab-n
- repeat forn
zones
Proposal 3
Less strenuous option to 2, but we consolidate our workloads to effectively a frontend and backend space. Where two regional clusters exist, but one has more node pools which have targets for specific zones. The other regional cluster is effectively the same we have today.