Release 10.4 report
With release 10.4, we attempted introducing release changes to improve stability of the RCs and reduce process bottlenecks.
Stats
Release related work started on 2018-01-05 and was finished on 2018-01-22.
During this period, we've created 8 Release candidates and one final release. One release candidate was not released to the public. One release candidate was part of the security release.
CE-EE merges became part of the Release Managers tasks, during this period 40 CE-EE MRs were created.
QA on staging was done for 7 releases.
Not counting waiting for specs, wait time for package builds and similar, approximately 20 hours of work went into preparations and work related to releasing a version of GitLab.
Around 33 hours went into deployment tasks on each of staging, canary, and production deployment.
Release tasks time breakdown |
---|
Deploy stats
As seen above, release related task occupied 60% of release tasks.
Deploy tasks time breakdown |
---|
RC1 deployment to staging took 180 minutes during which we found a migration issue that resulted in rejecting RC1 from further release. Subsequently deployed RC2, took only 30 minutes to deploy to staging because it contained only the fix for RC1 problems.
RC2 was the first candidate to be deployed on GitLab.com. Due to a large diff between what was on GitLab.com and new RC, the deploy took 240 minutes to complete.
In the week of 15-19 January, we deployed RCs to production daily. The smaller diff between each RC meant that each deploy to production was hovering around 2 hours.
Release process changes
CE-EE merges
RMs had a task of merging minimum of one CE-EE in their timezone. Together with the automation that was completed to create and notify Merge request, this has a positive effect overall. Before automation, RM's owning this process meant that we succeeded in having stable branches updated before the feature freeze.
CE-EE merges will become part of RM tasks.
QA task
QA task was previously done by Quality team (Edge). In this release, RM's would create an issue from QA task template and invite people to check off their contributed change. Strict deadline also meant that we could move to do the next tasks, even if something was not checked of. If a task that was not checked off would to create issues in production deploy, RM's would revert the change and create the next RC. This, luckily was not needed.
QA task will become part of release process
Exception request
We introduced an Exception request template to be able to track the impact of a change that is requested to be merged after the feature freeze. This allowed RMs to have a clearer idea of possible impact, and decide how exception requests could be grouped.
Exception request process will be streamlined
Communication
We communicated via twitter and broadcast messages at every deploy. We communicated when there is a Pages interruption incoming and linked to public monitoring. While a tad bit annoying, we had no major issues due to Pages going offline.
Less successful changes
RM daily task
We introduced a daily task issue to be able to track what needs to be done by any of release managers each day. Example of the daily task issue. This was useful for compiling this report, but it caused confusion for the Release Managers and others following the release. Dedicated release channel for only RMs has proved to be more successful. This created a place where release handoff was the only thing that was discussed and it allowed RMs to sync up more easily. #releases channel is too noisy for this purpose.
Daily task is not going to be used going forward, instead a Slack channel for release handoff will be used.
First deployed RC QA
First deployed RC QA task involved both Product and Engineering. This was largely confusing because it was the first time we did this and we didn't know what to expect.
First deployed RC will still include both Product and Engineering, but the format will be changed.
Creating stable branches before 7th
We created the stable branches on the 5th to make sure that we are prepared for tagging RC1 on time. Since this was not communicated well to the rest of the company (short deadline for a large process change), this caused some unnecessary confusion. Going forward, if we decide to do this again, we will have to clearly inform everyone in #development before hand.
Using Epics for tracking issues
We had a release epic where we tracked the release up until one point. This feature is still at very early development so it is unsuitable for efficient use. For example, it is not possible to assign an epic from within an issue, it is not simple to understand the relation between the epic and issue. Meta issues work better at the moment.
Production deploy caused Pages downtime
Every time we deployed a new version of GitLab to GitLab.com, GitLab pages would go offline for 2-3 minutes. See https://gitlab.com/gitlab-org/takeoff/issues/43 and https://gitlab.com/gitlab-com/infrastructure/issues/3546 . This needs to be resolved soon, the features are now in place for the change to go into production
Going forward
We started consolidating the release process in the release group.
This group contains, tasks
, docs
and tools
projects. Docs project will have everything release related in one project. Release-tools will be moved to tools as soon as we decouple process documentation from it.
Tasks will remain a place for Release Managers to create the issues for their release tasks and anything related to the release.
We will start using the tasks issue tracker to create release meta issue which will be only used as a place where one can find information on where to go for their request. Each individual release will have its own issue linked to this meta issue.
We will attempt to create an RC1 before the feature freeze again. Smaller diff between releases has proven to be very effective.