Audit of end-to-end test runs in various environments

Audit of E2E runs

The goal of this analysis is to be more efficient and reduce load/noise for SETs on call.

Useful links:

Diagram that summarizes the places where we run e2e tests

Release_handbook_page_-deployment_pipeline_and_e2e_tests__1

– Source: https://docs.google.com/presentation/d/1A0G1_HE19Y3X2K3fTnKl0wKx-AaVetibl3UlA6i8NjQ/edit#slide=id.g13a3474b417_0_3

master - https://gitlab.com/gitlab-org/gitlab

Nightly package - https://gitlab.com/gitlab-org/quality/nightly

Staging-ref environment - https://ops.gitlab.net/gitlab-org/quality/staging-ref

Staging Ref is a sandbox environment used for pre-production testing of the latest Staging Canary code.

Staging-canary environment - https://ops.gitlab.net/gitlab-org/quality/staging-canary

Staging-Canary is an environment subset or deployment "stage" in the Staging environment, sharing most of the same infrastructure as Staging. This additional stage is designed to assist us with capturing issues arising due to mixed deployments, where we have multiple versions of one or more components of GitLab that share services such as the database. Information on how to access it, use it, and what services it covers is documented in our handbook page on canary stage environments.

Canary - https://ops.gitlab.net/gitlab-org/quality/canary

Production-Canary is a environment subset or deployment "stage" in the Production environment, sharing most of the same infrastructure as Production. This additional stage is designed to assist us with rolling out new releases to end users in a more controlled fashion, hoping to catch issues affecting users in a way that minimises impact.

Staging environment - https://ops.gitlab.net/gitlab-org/quality/staging

Preprod environment - https://ops.gitlab.net/gitlab-org/quality/preprod

The pre environment is an environment used for validating release candidates used to prepare final self-managed releases and production patches. It does not have a full production HA topology or a copy of the production database.

Release environent - https://ops.gitlab.net/gitlab-org/quality/release

The release environment is an environment used for validating security releases, self-managed final monthly and patch versions. It does not have a full production HA topology or a copy of the production database.

Production environment - https://ops.gitlab.net/gitlab-org/quality/production

Conclusions & Proposals

Preliminary notes

  • Staging & staging-canary look very stable (failure notifications are very rare)
  • Canary seems to have more failures than production
  • Do we need to run quarantined tests at all? These jobs are allowed to fail and don't seem to add any value.

Nightly

gitlab-org/quality/nightly

gitlab-org/gitlab

  • Proposal: Stop running e2e:package-and-test-ee on gitlab-org/gitlab nightly schedules: these already run every 2 hours. Implemented.

staging-ref

From #174 (comment 1285274596):

staging-canary

  • Question: Do we need to run daily full QA suite? We already run the full suite on master, staging-ref, canary and staging deployments.

  • Answered by Zeff at #174 (comment 1282036371):

Yes. staging-canary is our first opportunity to capture issues by testing in a more production-like environment. If we remove a full run, I would remove it from staging since the purpose of that environment now is to mimic what we already have in production and won't really help us catch something early in the process. The only other tests running here are smoke/reliable on deployments.

canary

staging

production

Edited by Rémy Coutable