Migrate Kubernetes integration tests to use the Runner Kubernetes Cluster
What does this MR do?
- Migrate Kubernetes integration tests to use the... (!5175 - merged) • Georgi N. Georgiev | GitLab • 17.7 (You are here)
- Improve Operator installation and add GKE features (gitlab-org/ci-cd/runner-tools/grit!124 - merged) • Georgi N. Georgiev | GitLab • 17.10
- https://gitlab.com/gitlab-org/ci-cd/runner-tools/runner-kubernetes-infra/-/merge_requests/1+s
- Remove test dirs and flatten operator modules (gitlab-org/ci-cd/runner-tools/grit!146 - merged) • Georgi N. Georgiev | GitLab • 17.10\
- Allow deploying multiple runners in a single k8... (gitlab-org/ci-cd/runner-tools/grit!147 - merged) • Georgi N. Georgiev | GitLab • 18.1
- https://gitlab.com/gitlab-org/ci-cd/runner-tools/runner-kubernetes-infra/-/merge_requests/2+s
- Run CI jobs in kubernetes (gitlab-org/charts/gitlab-runner!504 - merged) • Georgi N. Georgiev | GitLab • 17.9
Migrates the existing Kubernetes integration tests to run into the already setup Kubernetes cluster through https://gitlab.com/gitlab-org/ci-cd/runner-tools/runner-kubernetes-infra/-/merge_requests/1/diffs rather than a k3s instance ran inside a VM.
Some notable changes in this MR:
- Tests are split into 3 separate jobs: legacy strategy, attach and all non-ff tests
- All tests can be ran in parallel now and are marked as such
- A resource group is set on the integration tests to prevent running too many pods at once onto the cluster as the concurrency is quite high to prioritize good job timings
- The jobs now run for about 5 minutes down from 30
- I've fixed as many flaky tests as I could - at least the ones that were simple enough
- Other tests I've skipped, I'll create an issue to fix them as a followup as many seem unplausable
- We also now generate all permissions required to run a manager in Kubernetes as a yaml file
- This yaml file is used to provision RBAC permissions in the cluster, if the objects are not deleted after (destroy doesn't run for some reason) - the cluster will clean them up, no need to worry for that
- All objects that are created in the tests namespace will cleaned up at some point if not deleted, we should still try to cleanup after ourselves though
Update, test are down to 3 minutes per job now.
I switched from splitic to gotestsum. The biggest purpose of splitic is to split the tests into multiple jobs, which isn't applicable here. gotestsum has the following advantages:
- Startup time is faster - that's where the 2 saved minutes come from
- It supports retrying failed tests, we retry tests up to 3 times now, which should reduce flakyness
- We generate junit reports now. Splitic doesn't support putting the failed test's output into the report as far as I could see, gotestsum does that so it's easier to find out why a test failed
Why was this MR needed?
To make the integration tests more reliable
What's the best way to test this MR?
Integration tests should run
Running in a local cluster locally should also work
What are the relevant issue numbers?
Closes Dogfooding the Kubernetes executor - Step 2 - R... (#38305 - closed) • Georgi N. Georgiev | GitLab • 17.7, Run kubernetes integration tests with a service... (#38306 - closed) • Georgi N. Georgiev | GitLab • 17.7