Block pipelines on smoke test failure on GDK (in the `e2e:test-on-gdk` pipeline)
What does this MR do and why?
Blocks pipelines if smoke tests fail when run against GDK (in the e2e:test-on-gdk
pipeline). This will help avoid releasing bugs in the unlikely event of problems not being caught by unit/integration/system tests. This is part of the efforts to avoid any more S1 incidents.
- Also blocks pipelines if the
build-gdk-image
image fails.
Part of gitlab-org/quality/quality-engineering&4 (closed) Documentation to come: gitlab-org/quality/quality-engineering/team-tasks#1841 (closed) (covering this and all previous related changes)
Rationale - Recent GDK test results
This dashboard shows the last 30 days of gdk-qa-smoke
jobs: https://dashboards.quality.gitlab.net/goto/B2DooDjVg?orgId=1
It shows that 7074 tests passed and 150 failed. The failures were because:
- a test detected a banned deprecation warning (see #417408 (closed) and thanks @jay_mccure for the fix !125844 (merged)!)
- a test failed with error 500, which was caused by a bug with seeding Issues. It was first reported in gitlab-development-kit#1874 (closed). See also gitlab-org/quality/pipeline-triage#204 (comment 1450366157)
This confirms that the only failures in the GDK tests were legitimate failures (not broken/stale tests or flakiness), and in both cases the failures weren't detected by other e2e tests (although one was detected in the main GDK project pipeline too, and the other was a deprecation warning that wouldn't have caused problems for users).
Effects on pipelines
Smoke test failures will block MR and scheduled master pipelines. We can also expect problems that affect GDK itself to also block pipelines (e.g., if change to GDK means the Dockerfile needs to be updated, or if there's a bug that needs to be fixed upstream). gitlab-development-kit#1874 (closed) is an example where such a problem was detected and fixed quickly, but there have been other cases that only affected GDK tests in gitlab-org/gitlab
that left master broken for hours.
If that happens again, the quickest mitigation it is to remove the QA_RUN_TESTS_ON_GDK
CI variable from https://gitlab.com/gitlab-org/gitlab/-/settings/ci_cd. That will mean e2e:test-on-gdk
will not be included in new pipelines (it won't help existing pipelines, but if MRs are blocked they should become unblocked by starting a new pipeline). And then the problem can be investigated and fixed without holding up development.
Pipeline status when a smoke test fails in MRs with different changes:
Changes in MR | Pipeline | Expected | Actual |
---|---|---|---|
QA only | https://gitlab.com/gitlab-org/gitlab/-/pipelines/937751239 | Fail | Fail |
BE code | https://gitlab.com/gitlab-org/gitlab/-/pipelines/937735222 | Fail | Fail |
Docs | https://gitlab.com/gitlab-org/gitlab/-/pipelines/937753640 | Pass | Pass |
(Tested via MRs with the changes noted above, with the MRs targeting 9c045db9 to make a test fail, with that commit following the changes in this MR)
Pipeline status is failed
when the GDK image build job fails: https://gitlab.com/gitlab-org/gitlab/-/pipelines/936271688
Announcement (to be shared)
To be posted in Slack (#quality
, #development
, #engineering-fyi
) and the engineering WiR doc:
- a regression, first reported in gitlab-development-kit#1874 (closed)
- a banned deprecation warning #417408 (closed)
If there are questions or concerns, please let us know in
#quality
(if necessary the tests can be excluded from new pipelines by removing theQA_RUN_TESTS_ON_GDK
CI variable).
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.