Add monitoring for the Helm charts publishing process

Background

When a GitLab version is released, Helm charts must also be published to charts.gitlab.io so that users who install GitLab via Helm can pick up the new version. The publishing process involves several hops:

  1. A tag is created in the canonical (or security) Charts repository on GitLab.com
  2. The tag is mirrored to dev.gitlab.org and a pipeline runs there
  3. The release_package CI job in that pipeline triggers a downstream pipeline in the charts/charts.gitlab.io repository on GitLab.com
  4. Once that downstream pipeline succeeds, the new chart version is live and available to users

Problem

Recently, a GitLab version was released but the corresponding Helm charts were not published to charts.gitlab.io. Release Managers (RMs) had no visibility into this failure — it was only discovered when a team member noticed and reported it to us.

The current RM process only checks whether the tag pipeline on dev.gitlab.org was successful. It does not verify the downstream triggered pipeline in charts.gitlab.io.

Reference: gitlab-org/release/docs!1014 (merged), and discussion in the revert MR charts/charts.gitlab.io!189 (comment 3152738040)

Goal

Add monitoring so RMs are notified that publish step has failed

Options

The following options are available. This issue is intended to kick off discussion so we can agree on an implementation plan.

Option A: Automate pipeline status checks with a Slack alert

Add an automated check to the dev.gitlab.org tag pipeline which polls the charts.gitlab.io pipeline status after a release tag is created and fails the tag pipeline if the triggered pipeline fails.

Option B: Add a post-release verification job to the Charts pipeline in Canonical

Add a CI job to the Charts tag pipeline which runs helm show chart --version ... and verifies that the chart has been published to charts.gitlab.io.

This would be similar to the [check-packages-availability](https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/a072dc9d053443bd88a502210c3213abea4ab785/gitlab-ci-config/dev-gitlab-org.yml#L1849\] CI job in the Omnibus codebase.

Option C: Add a manual checklist step

Warning

We don't want to add more steps to the checklist. So, this is not really an option. I am noting it here, just in case.

Add a step to the RM release checklist to explicitly verify the triggered pipeline in charts.gitlab.io and confirm chart availability via:

helm show chart --version <version> gitlab-repo/gitlab

Exit Criteria

  • Discuss!
  • Decide which option to implement / what to do
  • Implement!
  • Verify with 18.11 monthly release
Edited by Dat Tang