Docs: Kubernetes with SaaS runners for autoscaling
Problem to solve
The docs don't currently explain why Kubernetes isn't used for autoscaling with GitLab SaaS runners. Adding information would help users understand why.
Further details
Proposal
Add docs to summarize the following info from this Slack (internal discussion):
We don't use Kubernetes for our SaaS runners because GitLab.com is a multi-tenant service. We must ensure that workloads from user A will never be accessed by workloads of user B. Kubernetes doesn't bring enough separation. There is always a risk that someone will be able to escape from the container in which the job is execute and leave something on the Kubernetes node. I don't know if this is true but in the past even the Kubernetes docs were clearly informing that security boundaries should be enforced by separation of clusters or separation of node groups, and not just by the pods and/or namespaces. In case of GitLab.com we would need to create a new cluster and/or a new dedicated nodes pool for every single customer that could run jobs on our SaaS runners. And that just doesn't scale for us
🙂 So we've taken a different approach, where we use full virtualisation (we trust separation provided by hypervisors used by GCP or AWS) and we delete the instance after a single job was executed. That allows us to have a fresh, never used before environment for every job that is started. If the customer doesn't run workloads for many tenants that need to be separated, then there should be no security risk in using Kubernetes for handling runner autoscaling.