Skip to content

Create Enterprise guide for deploying and scaling a GitLab Runner Fleet

Darren Eastman requested to merge docs-deastman-27198 into main

What does this MR do?

Creates a guide for helping organizations in planning for and configuring runners at scale.

Why was this MR needed?

Today we provide no or little guidance on how to deploy and scale a GitLab Runner fleet. When customers or technical account managers ask for guidance, we basically have to address each request individually. This is inefficient, and is resulting in a negative customer experience.

Questions to answer with this MR and future iterations:

  • Which executor option should I choose for my runner fleet?
  • Which computing platform should I consider for hosting my fleet (VMs, Kubernetes)?
  • What are the inputs that I should be aware of in making runner fleet configuration decisions?
  • How do I plan the setup of the runner fleet to meet my organization's needs?
  • How do I, and what do I need to monitor (metrics) once the fleet is setup (Day-0 configuration)?
  • Do you GitLab, have any recommendations in terms of scaling the fleet to meet my organization's needs?

Monitoring Questions

  • How many jobs are in the different statuses at a given time (waiting to be picked up, running)
  • Whats the error rate of the runner(s) requesting jobs from the GitLab instance (aka: is the runner talking effectively to the GitLab instance)
  • If auto-scaling, how many resources are being used (VMs, pods, nodes etc)
  • Where are the jobs coming from - which projects/namespaces
  • If auto-scaling, are the runner managers over saturated? aka: simple metrics (CPU, memory usage, etc) of the runner manager(s)

What are the relevant issue numbers?

#27198 (closed)

gitlab#20278 (closed)

Edited by Suzanne Selhorn

Merge request reports