Automate deployment of GitLab HA and Geo
What's this issue all about? (Background and context)
Customers have reported that deploying GitLab with HA and Geo, especially at large scale, is manual and time consuming. Many of them end up writing their own automation to create and configure the nodes required for their GitLab deployment. Providing some out-of-the-box automation would save customers time, improve their first impressions of GitLab, and would result in more standardized deployments of GitLab that are easier to support.
What are the overarching goals for the research?
What functionality does the tool need to provide before customers are likely to use it? Which third-party tools do customers prefer for infrastructure automation and why? Can the tool be a standalone tool initially?
What hypotheses and/or assumptions do you have?
- Customers are already automating the deployment and maintenance of large GitLab deployments
- There is a steep learning curve for new customers onboarding and attempting to create their own automation
- Customers delay scaling their deployment to meet demand because of the effort required
- When prospects evaluate GitLab, we have a compelling HA and geo replication story. We would have an even greater competitive advantage if we offered a fast and simple way to get up and running with these architectures.
- Administrators have a set of automation tools that they are comfortable using and use regularly. If we provide a solution that uses familiar tooling, it is more likely to be adopted and liked.
What research questions are you trying to answer?
How long does it take to set up HA? Are customers referring to the reference architectures to design their architecture? Which parts of the install, scaling, and upgrade experience are customers automating? Which tools are being used for automation? If we offer a tool to automate setting up HA/Geo, how much control should it have over the architecture that is deployed?
What persona, persona segment, or customer type experiences the problem most acutely?
Persona:
- Sidney (System Administrator)
Customer type:
- Customers running GitLab on multiple nodes for scaling or high availability purposes
- Customers with geographically dispersed teams that operate a Geo secondary cluster to reduce latency
What business decisions will be made based on this information?
Should it be tied to a tier?
Who will be leading the research?
PM Distribution
What timescales do you have in mind for the research?
The research has been conducted over the past four months. Highlights from customer interviews are provided in the comments section below.
Relevant links (script, prototype, notes, etc.)
Findings from research
- Ansible and Terraform are preferred tools for creating automation
- Automation should provide default architecture and values based on the reference architecture
- Prioritize 2,000-user architecture first
- Defaults are a good starting point but they need to be customizable
- Needs to run across different cloud platforms, in on-prem data centers, and in air gapped environments
- Should support upgrades
FYI: @gitlab-org/distribution Consolidated feedback on automating orchestration of HA and Geo