Runner Fleeting / Taskscaler / GRIT Test Plan
Description
The Next Runner Auto-scaling Architecture replaces Docker Machine with system composed of Fleeting for instance provisioning, Taskscaler for autoscaling and 2 new executors (Instance and Docker-Autoscaler) for running jobs in the new architecture.
These new components need to be tested on the unit, integration and end-to-end level. This issue outlines what testing is in place and what is needed for completeness. And which various environment / dimensions we should test.
Proposal
Where do we start runner manager?
______________________________________/\____________________________________
/ \
| |
+------------------------------------------------+-----+-----+-----+-----+-----+ ___
| GitLab.com (runner-incept) | GKE | GCE | EC2 | EKS | ... | \
| +---------------------+-----+-------+-------+ | | | | | | |
| | Runner Binary | | | | | | | | | | |
___ | | +-----------------+ | | | | | | | | | | |
Runner / | | | Runner Packages | | ... | ... | ... | | | | | | | | Runner
Integ. | | | +-----------------+ | | | | | ... | ... | ... | ... | ... | | E2E
Tests | | | | Taskscaler | | | | | | | | | | | | Tests
\___ | | +-----------------+ | | | | | | | | | | |
| +---------------------+-----+-------+-------+ | | | | | | |
| | Fleeting Plugin AWS | GCP | Azure | Local | | | | | | | |
| +---------------------+-----+-------+-------+ | | | | | | ___/
+------------------------------------------------+-----+-----+-----+-----+-----+
| |
\___________________ ____________________/
\/
Where do we run the job (runner)?
Test Types
The runner binary can be started in a variety of environments. For testing purposes we start the runner binary inside a GitLab job, thus the name "runner-incept". A GitLab job is started in a runner, which downloads the runner binary (and a plugin), registers it with GitLab and then runs a "hello world" type job to verify it works (runner in a runner).
However this doesn't truly replicate a setup which customers would use. A true end-to-end test would setup an environment like a cluster Google Kubernetes Engine (GKE) or Amazon Kubernetes Service (EKS) or a VM in Google Compute Engine (GCE) or Amazon EC2.
We also have integration tests for each component. The runner integration tests are standard Golang tests with a file suffix of integration_test.go
and a //go:build integration
build tag. These runner tests execute jobs in an actual environment (local shell, Kubernetes cluster, etc...) But they stub out GitLab and inject the JobResponse payload and verify the job output logs without sending them. Likewise each plugin can have an integration test which stubs out fleeting API calls and verifies resources created in the target environment.
And each package (and sometimes each file) has unit tests of small units of functionality.
So there are 4 kinds of tests relevant to runner:
- runner-incept, a pseudo end-to-end test
- end-to-end tests (not yet implemented)
- integration tests (partially implemented)
- unit tests
We won't test the combination of every runner environment with every job environment with every scenario in every test type. Instead we will get comprehensive test coverage of each dimension in a single test type.
Runner Incept
We will test a single "hello world" scenario in each of the plugins in runner-incept:
-
AWS plugin (Runner Incept test for fleeting AWS plugin (#29437 - closed) • Unassigned • 16.4) -
GCE plugin (Runner Incept test for GCE Fleeting Plugin (#36791 - closed) • Unassigned • 16.5) -
Azure plugin (Runner Incept test for Azure Fleeting Plugin (#36792) • Unassigned • 17.2) -
Kubernetes plugin -
static plugin
End-to-End
We will test a single, realistic user scenario end-to-end in each environment in which might setup the runner:
-
DIND (DIND E2E testing (gitlab-org/ci-cd/runner-tools/grit#68) • Unassigned) -
GKE (End-to-End test runner manager in GKE (#36793) • Unassigned) -
GCE (End-to-End test runner manager in GCE (#36794) • Adrien Kohlbecker • 17.2) -
EC2 (End-to-End test of runner manager on EC2 (#36798 - closed) • Joe Burnett • 16.7) -
EKS (End-to-End test of runner manager on EKS (#36797) • Unassigned) -
AKS (End-to-End test of runner manager of AKS (#36795) • Unassigned) -
Azure VMs (End-to-End test of runner manager on Azure VM (#36796) • Unassigned) - etc...
Integration
We will test a variety of autoscaling cases in the Taskscaler integration test:
-
Stable. Verify a stable rate of jobs produces a stable number of instances. -
Scale-up. Verify an increasing number of jobs increases the number of instances. -
Scale-down. Verify a lowering number of jobs decreases the number of instances (after idle period).
(Integration tests (gitlab-org/fleeting/taskscaler#3 - closed) • Arran Walker • 16.5)
And we will test each autoscaled executor in the runner integration tests:
-
Instance -
Docker-Autoscaler
(Taskscaler-based executor integration tests (#30880 - closed) • Arran Walker • 16.6)
Unit
And of course we aim for 100% unit test line coverage, but set the bar somewhere like 85-90%.
-
Taskscaler (Taskscaler (autoscaler) executor unit tests (#29318 - closed) • Davis Bickford • 16.0) -
Internal Autoscaler Executor (Taskscaler (autoscaler) executor unit tests (#29318 - closed) • Davis Bickford • 16.0) -
AWS Plugin (gitlab-org/fleeting/fleeting-plugin-aws#6) -
GCE Plugin (Unit tests in GCP plugin (#29438 - moved) • Unassigned • 16.5) -
Azure Plugin (Unit test Azure plugin (#36800) • Unassigned) -
Static Plugin -
Kuberentes Plugin