Skip to content

Block pipelines on smoke test failure on GDK (in the `e2e:test-on-gdk` pipeline)

What does this MR do and why?

Blocks pipelines if smoke tests fail when run against GDK (in the e2e:test-on-gdk pipeline). This will help avoid releasing bugs in the unlikely event of problems not being caught by unit/integration/system tests. This is part of the efforts to avoid any more S1 incidents.

  • Also blocks pipelines if the build-gdk-image image fails.

Part of gitlab-org/quality/quality-engineering&4 (closed) Documentation to come: gitlab-org/quality/quality-engineering/team-tasks#1841 (closed) (covering this and all previous related changes)

Rationale - Recent GDK test results

This dashboard shows the last 30 days of gdk-qa-smoke jobs: https://dashboards.quality.gitlab.net/goto/B2DooDjVg?orgId=1

It shows that 7074 tests passed and 150 failed. The failures were because:

This confirms that the only failures in the GDK tests were legitimate failures (not broken/stale tests or flakiness), and in both cases the failures weren't detected by other e2e tests (although one was detected in the main GDK project pipeline too, and the other was a deprecation warning that wouldn't have caused problems for users).

Effects on pipelines

Smoke test failures will block MR and scheduled master pipelines. We can also expect problems that affect GDK itself to also block pipelines (e.g., if change to GDK means the Dockerfile needs to be updated, or if there's a bug that needs to be fixed upstream). gitlab-development-kit#1874 (closed) is an example where such a problem was detected and fixed quickly, but there have been other cases that only affected GDK tests in gitlab-org/gitlab that left master broken for hours.

If that happens again, the quickest mitigation it is to remove the QA_RUN_TESTS_ON_GDK CI variable from https://gitlab.com/gitlab-org/gitlab/-/settings/ci_cd. That will mean e2e:test-on-gdk will not be included in new pipelines (it won't help existing pipelines, but if MRs are blocked they should become unblocked by starting a new pipeline). And then the problem can be investigated and fixed without holding up development.

Pipeline status when a smoke test fails in MRs with different changes:

Changes in MR Pipeline Expected Actual
QA only https://gitlab.com/gitlab-org/gitlab/-/pipelines/937751239 Fail Fail
BE code https://gitlab.com/gitlab-org/gitlab/-/pipelines/937735222 Fail Fail
Docs https://gitlab.com/gitlab-org/gitlab/-/pipelines/937753640 Pass Pass

(Tested via MRs with the changes noted above, with the MRs targeting 9c045db9 to make a test fail, with that commit following the changes in this MR)

Pipeline status is failed when the GDK image build job fails: https://gitlab.com/gitlab-org/gitlab/-/pipelines/936271688

Announcement (to be shared)

To be posted in Slack (#quality, #development, #engineering-fyi) and the engineering WiR doc:

📣 We're no longer allowing end-to-end smoke tests to fail when run against GDK in MR and master pipelines (!126267 (merged)). We don't expect these tests to fail often - in the last month there has been no flakiness and only two failures:

  1. a regression, first reported in gitlab-development-kit#1874 (closed)
  2. a banned deprecation warning #417408 (closed) If there are questions or concerns, please let us know in #quality (if necessary the tests can be excluded from new pipelines by removing the QA_RUN_TESTS_ON_GDK CI variable).

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Mark Lapierre

Merge request reports