Review and discuss end-to-end test environments/pipelines strategy
Context
We currently run end-to-end tests in CI with GitLab deployed from two types of packages to several test environments:
Package | Test Environment |
---|---|
Omnibus-gitlab | Docker in package-and-qa jobs |
Docker in nightly jobs |
|
reference architecture tester | |
Cloud Native GitLab | Review Apps |
Preprod/Staging/StagingRef/Canary/Production |
We run tests against those environments in several pipelines:
-
gitlab-org/gitlab
master
and MRs -
gitlab-org/gitlab-qa
master
and MRs nightly
- Preprod/Staging/StagingRef/Canary/Production
The tests we run vary from the full suite, to a subset of reliable tests, to a smaller subset of smoke tests, to a diverse number of tests in different orchestrated scenarios.
Problem
The problem is that it takes upwards of an hour to package and deploy GitLab and then run tests. Any inefficiencies in packaging, deployment, or test execution are compounded when they affect multiple redundant pipelines. This has prompted several discussions and at least one rapid action:
- gitlab-org&8584 (closed)
- gitlab-com/www-gitlab-com#10557 (closed)
- https://gitlab.com/gitlab-org/quality/team-tasks/-/issues/1276
- gitlab-org&3806
There is considerable overlap between tests run on different environments (e.g., tests run via package-and-qa
and on Review Apps in MRs). There is also some necessary redundancy between pipelines (e.g., the same tests run in MRs, then master
, then Staging etc.)
Discussion prompts
- What's the minimum set of tests and pipelines that will provide optimal coverage?
- We've previously raised the possibility of retiring
package-and-qa
(e.g., if we use Review Apps for scenario orchestration). Should we revisit that option? - Even if we reduce the time it takes to build and install GitLab (e.g., see some of the tasks in this rapid action), there will always be an irreducible amount of time required. Should we consider when it might be worthwhile running tests on GDK in CI instead of a production-like instance, so that we can have a test instance ready much faster (e.g., complete the smoke tests in 25 mins instead of 60+ mins)?