Skip to content
GitLab
Next
    • GitLab: the DevOps platform
    • Explore GitLab
    • Install GitLab
    • How GitLab compares
    • Get started
    • GitLab docs
    • GitLab Learn
  • Pricing
  • Talk to an expert
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    Projects Groups Snippets
  • Sign up now
  • Login
  • Sign in / Register
  • reliability reliability
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 1,160
    • Issues 1,160
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Insights
    • Issue
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.comGitLab.com
  • GitLab Infrastructure TeamGitLab Infrastructure Team
  • reliabilityreliability
  • Issues
  • #15192
Closed
Open
Issue created Feb 09, 2022 by John Skarbek@skarbekOwner

Discussion: Should we recreate our zonal/regional clusters differently

When we started with Kubernetes, we had a single lonely highly redundant regional cluster. The node pools were spread out across all zones as well. We later created a set of zonal clusters to help deal with the costs associated with network bandwidth charges for our frontend workloads. But we've got a situation where zonal clusters only have 1 API server. Therefore, anytime GKE is performing maintenance to the API, Kubernetes appears down for that cluster. If we switch to regional clusters, we have the ability to limit that percieved downtime and still operate on a fully functional cluster, even during maintenance. But this possibility opens up some new questsions. Should we consider redesigning how we've deployed our clusters in an architectural manner?

Let's utilize this issue to discuss a few options.

Options

Proposal 1

Simply replace our zonal clusters with regional clusters where the node pools stay locked to a given zone

Proposal 2

Consolidate all clusters to one. Remove our zonal clusters in favor of expanding our existing regional cluster with node pools that are locked to a given zone, creating a new deployment that targets a zone similar to how zonal clusters operate today. The thought here is that each namespace effectively marks which zone we operate out of.

  • gitlab - our existing regional configuration
  • gitlab-b - the same deployment that is located on cluster gprd-us-east1-b
  • gitlab-c - same deal for gprd-us-east1-c
  • gitlab-n - repeat for n zones

Proposal 3

Less strenuous option to 2, but we consolidate our workloads to effectively a frontend and backend space. Where two regional clusters exist, but one has more node pools which have targets for specific zones. The other regional cluster is effectively the same we have today.

Edited Dec 13, 2022 by John Skarbek
Assignee
Assign to
Time tracking