When tests are skipped in the Rails codebase MRs, errors are passed to downstream Charts pipelines
Note: pending where this discussion goes, it may make sense to move this issue elsewhere (such as gitlab-org/gitlab
). I'll start here in gitlab-org/distribution
as this is where the problem is surfacing and go from there.
Context
Slack thread via David Dieulivol:
With !131791 merged, the internal review-apps will not be deployed automatically in MRs anymore (see gitlab-org/quality/engineering-productivity/team#285 (closed) for context).If you’d like to deploy an internal review-app as part of your MR, you can either trigger the
start-review-app-pipeline
manual job, or add the pipeline:run-review-app label to your MR and trigger a new CI/CD pipeline (see the docs).
Slack thread in #g_distribution
Summary
- Review apps (and therefore QA tests) are not run automatically, and are set to
manual
instead. - Oftentimes, these tests are not manually run before an MR is merged.
- This means that
master
ofgitlab-org/gitlab
can contain bugs that could have been caught in the MR pipelines. - This leads to downstream consumers being impacted by these bugs. For example, because
gitlab-org/build/CNG
builds off ofmaster
, pipelines can fail in Distribution projects such asgitlab-org/charts/gitlab
for all branches (merge request branches,master
,stable
, etc.), blocking development.
More details
Run review-apps manually in MRs made review-apps manual, implementing the proposal from Proposal - Run internal review-apps manually.
This means that changes can be merged to master that are not fully validated. As a result, Distribution pipelines break due to failures in the upstream codebase rather than failures relevant to Distribution projects. Some examples:
- Failure in browser_ui/1_manage/login/login_via_oauth_and_oidc_with_gitlab_as_idp_spec.rb | Manage OAuth behaves like Instance OAuth Application creates oauth application and uses it to login
- Database error: 'ActiveRecord::StatementInvalid: PG::CheckViolation: ERROR: check constraint "check_0dd5948e38" of relation "users" is violated by some row'
- Migrations failure: 'PG::UndefinedTable: ERROR: relation "design_management_repositories" does not exist'
- review-apps - ERROR: relation “postgres_partitioned_tables” does not exist at character 85
I know it’s a tough problem, and I understand it from the infrastructure cost perspective, but I know in the past we’ve pushed for all tests to run upstream so they don’t rear their heads downstream. It takes a while to trace the problem back to the root cause, and disrupts our pipelines pretty severely for changes not related to our work.
Having it still run against master
is good - but this will still end up leaking those errors into Distribution pipelines since we build the images off of master
and use them in our review apps.
Next steps
The concerns raised in Proposal - Run internal review-apps manually are fair - it's quite expensive to run review apps for each of the many MRs that come through gitlab-org/gitlab
.
That said, allowing bugs into master
to be consumed downstream is expensive from a different perspective in terms of blocked pipelines, delayed development, and engineering hours spent digging for the root problem.
I'd love to find a good compromise that allows us to catch bugs before they get into gitlab-org/gitlab:master
while still enabling rapid development feedback cycles at a reasonable infrastructure cost.
To get the ball rolling, we could look into GitLab CI rules
that will automatically run review apps if certain files are modified (or other checks/criteria that identify potential fault points).
Welcoming any and all feedback here