Weekly deploys experiment & feature freeze back off

Summary

With our recent work to improve deployments, we can now experiment with loosening our feature freeze and more frequently deploying.

Plan

For next few weeks (as part of the %14.6 release) we will experiment with continuous delivery approach for Runner. We need to be sure that the version of Runner which we want to promote to stable was deployed on our main fleet for at least 1-2 days - to reduce the risk of introducing a regression in a stable release.

The process will go as follow:

Every Monday morning we will deploy on the private shard the version that is currently merged to main.

Such change creates the DEB package and adds it to the https://packages.gitlab.com/runner/unstable/ repository. We will need to find the proper version number, as the Bleeding Edge is versioned like 14.5.0~beta.88.g6e602edd. We have a script for that in Runner's repository, so it should be as easy as git checkout CHOSEN_SHA && ./ci/version.
If no problems are reported/discovered, at Tuesday morning we will deploy the same version to the rest of the Linux runners fleet.
At the defined tagging day (explained bellow) we check which version is currently deployed. If no problems were reported, we start the x-y-branch on that commit and tag the stable version. Apart of that the tagging, changelog updating, merging back to main, bumping the version file strategies will not change.

Important days

The only important day is the tagging day.

After we tag the GitLab Runner project with the new version, we need about 1 hour for the pipeline to be finished and packages and container images to be uploaded to the right places. Sometimes it's more because of the few flaky tests we have.

When the container images are uploaded, we can tag the stable version of Runner's Helm Chart. The pipeline here is much quicker, but we also need to be sure that the new version of the chart was released, before we will continue.

When Runner's Helm Chart is released, we can update the integrations: directly in GitLab and in the Cluster Integrations projects. This requires an action from these projects maintainers - the change needs to be merged. The integration in GitLab Helm Chart is updated automatically by a bot, but there is also some time between the Runner Helm Chart is released and GitLab Helm Chart update MR is prepared. The MR again needs the appropriate maintainer to take an action and merge it.

All of that defines a delay between when GitLab version is tagged and when GitLab Runner must be tagged for all of the integrations to be handled in time. This is especially important in the Major releases - we've already have once or twice a problem, when a new Major release of GitLab (for example 14.0.0) was released with integration to GitLab Runner in the previous Major release (for example GitLab Runner 13.12.0).

We need to confirm when GitLab's tagging day is happening. Then we need to add some offset to that (probably 1 or 2 working days to make sure that the required maintainers will have time to take the actions needed to merge the change). And this will define when is the GitLab Runner tagging day. Such day will then practically become a new Feature Freeze date.

Given a little dynamic nature of the tagging events (which highly depend on the pairing between GitLab's tagging for a given release and working/non-working days before them), the Feature Freeze will become also a little dynamic - in comparison to the more static rule of first working day after 7th. But it will definitely move us forward in time and almost align the Runner freezing time with GitLab freezing time.

From the past discussions with the distribution team, we've been asked to have the Helm Chart ready no later than at 18th day of the month. I think that aiming to 17th or 16th for Runner's feature freeze is a good initial iteration.

Expected results of the experiment

We will know what steps we need to do to prepare frequently repeated deploys of main. This will allow us to define what is needed to prepare a semi-automation for that (so we can remove as much manual work as possible).
We will know how to update the Release Checklist template. We will need to adjust the tagging procedure a little. And we will need to totally rewrite the steps required for deployments (or maybe even extract these to dedicated issues; in fact, decomposing our huge release checklist to smaller chunks for each of the steps would be a nice improvement).
We will know what is the best day for the tagging day to allow all the integrations to be merged before GitLab's tagging is happening.

Edited Nov 23, 2021 by Tomasz Maczukin