Define a metric to measure and track Release Manager toil

Problem Statement

Release Managers are responsible for many tasks including deployments, patch releases, security releases as well as ad-hoc requests. Previously all of our metric efforts have focused on tracking and understanding deployments via MTTP, Deployment SLO, and improved deployment pipeline observability. These metrics and our efforts to improve them allow us to safely deliver more changes to GitLab.com but it doesn't show the overall effort or workload of release managers.

This issue intends to define a Release Management Effort (name TBC) metric that we can use to measure and track improvements to release management. With this metric, we can identify the most labor-intensive, or brittle processes and plan work to improve them.

Current plan

There are many ways we can track release manager workload and lots of additional metrics that could help automate this. To keep things simple enough to get us started we'll try tracking the amount of time involved with each process plus any interesting additional data points that might help to plan improvements.

An MVP is being created on https://docs.google.com/spreadsheets/d/1xENgrQwAQkA3ImtxsnqgQYEGxxevbUeNhXLFRUUKayk/edit#gid=953768711

With the approach being decided, I'm going to close this issue in favour of individual metrics issues on &744 (closed)

Ideas for more sophisticated metrics can be seen below:

Create a metric that combines an effort score for each activity with a multiplier to show frequency. For example, if we consider a patch release to be a 2/5 effort and we typically perform 2 per month we would have a monthly metric of 4 (2*2). Combining the scores of all release manager tasks would give us an overall metric that we could use to measure release manager workload over time.

Example (with indicative numbers)

Task	Effort	Frequency	Total (Effort*Frequency)
Monthly release	3	1	3
Security release	5	1	5
Patch release	2	2	4
Backport Patch release	3	1	3
Ad-hoc deployments	1	5	5

This would give us an overall score of 20. Future work to reduce the score might focus on reducing the Effort level of a task or reducing the Frequency of the task. Both would be expected to reduce release manager workload.

Calculating task effort

Assuming the overall approach to creating the metric makes sense we need to decide how to measure the Effort of each task. Each task is complicated enough to deserve a combination of things to be considered but we should also make sure that we end up with something that we are able to track over time without excessive effort. Some ideas:

Number of manual steps - measured as a number. We could group into buckets to assess the effort of having manual steps.
Brittleness - a percentage of times the task "breaks" or needs release manager intervention to investigate or resolve something outside of the intended process. For example, in a security release we know that most, in not all security releases will need merge train intervention.
Coordination level - a bucket measure to track the level of release management coordination needed to complete activities in a specific order or with external input. For example, a backport patch release has a higher coordination level than a patch release because we need to work with Quality to complete testing.

Examples:

Security releases - we know these have many manual steps to pause nightly jobs, execute chatops commands, review MRs, etc. Security releases are also brittle, we regularly need to resolve merge train conflicts and can be affected by broken master and broken stable branches. Coordinating with different developers, AppSec, and Quality also give security releases a high coordination level.
Backport patch releases - we have 17 manual steps, this is less than security releases but more than deployments. We don't often have brittleness but we do need to do some coordination to make sure the MR is merged and on Production, work with Quality for testing, and handle the initial backport request.

Once we have agreed on the important elements to include in the effort score we can decide on the right buckets and consider how we can easily measure this with automation or similar.

Edited Mar 27, 2023 by Amy Phillips