When tests are skipped in the Rails codebase MRs, errors are passed to downstream Charts pipelines

Note: pending where this discussion goes, it may make sense to move this issue elsewhere (such as gitlab-org/gitlab). I'll start here in gitlab-org/distribution as this is where the problem is surfacing and go from there.

Context

Slack thread via David Dieulivol:

With !131791 merged, the internal review-apps will not be deployed automatically in MRs anymore (see gitlab-org/quality/engineering-productivity/team#285 (closed) for context).If you’d like to deploy an internal review-app as part of your MR, you can either trigger the start-review-app-pipeline manual job, or add the pipeline:run-review-app label to your MR and trigger a new CI/CD pipeline (see the docs).

Slack thread in #g_distribution

Summary

Review apps (and therefore QA tests) are not run automatically, and are set to manual instead.
Oftentimes, these tests are not manually run before an MR is merged.
This means that master of gitlab-org/gitlab can contain bugs that could have been caught in the MR pipelines.
This leads to downstream consumers being impacted by these bugs. For example, because gitlab-org/build/CNGbuilds off of master, pipelines can fail in Distribution projects such as gitlab-org/charts/gitlab for all branches (merge request branches, master, stable, etc.), blocking development.

More details

Run review-apps manually in MRs made review-apps manual, implementing the proposal from Proposal - Run internal review-apps manually.

This means that changes can be merged to master that are not fully validated. As a result, Distribution pipelines break due to failures in the upstream codebase rather than failures relevant to Distribution projects. Some examples:

I know it’s a tough problem, and I understand it from the infrastructure cost perspective, but I know in the past we’ve pushed for all tests to run upstream so they don’t rear their heads downstream. It takes a while to trace the problem back to the root cause, and disrupts our pipelines pretty severely for changes not related to our work.

Having it still run against master is good - but this will still end up leaking those errors into Distribution pipelines since we build the images off of master and use them in our review apps.

Next steps

The concerns raised in Proposal - Run internal review-apps manually are fair - it's quite expensive to run review apps for each of the many MRs that come through gitlab-org/gitlab.

That said, allowing bugs into master to be consumed downstream is expensive from a different perspective in terms of blocked pipelines, delayed development, and engineering hours spent digging for the root problem.

I'd love to find a good compromise that allows us to catch bugs before they get into gitlab-org/gitlab:master while still enabling rapid development feedback cycles at a reasonable infrastructure cost.

To get the ball rolling, we could look into GitLab CI rules that will automatically run review apps if certain files are modified (or other checks/criteria that identify potential fault points).

Welcoming any and all feedback here 👋

Edited Sep 21, 2023 by David Dieulivol