add engr metrics throughput (!12816) · Merge requests · GitLab.com / www-gitlab-com

Dalia Havens requested to merge eng-throughput into master Jul 04, 2018

First iteration of adding engineering productivity metric to our handbook. This commit describes the throughput model which is different from using velocity/weights to capture the true team's capacity. There are many advantage to this approach which I will go over in the MR and below.

1 throughput unit = 1 Issue with delivered code to production be it feature work or engineering led. This is key to reflect the true team capacity and also important as it will allow us to understand how tech debt may be slowing down the development of feature work on a team.

As we are starting this off, the idea is to build it in a very lightweight way (1 issue with MR = 1 throughput) and we will iterate further from there. Note that an issue can have multiple MRs attached to it, it will still count as 1 throughput and we an evaluate this and refine in the next iteration.

A few things I need to clarify in the MR:

What problem are we solving
What is Throughput and how is it different from velocity or weights
Why use throughput over weights which we currently have implemented on some teams (with success I will add)

Here's a little bit of insight into why I am recommending using throughput and the value I have seen implementing this model with my previous teams. First just a little background, I've worked with many different sprint and kanban models (every team implements this a little differently), also used both velocity and throughput.

A story I like to tell (which is not fictional) but the data in this graph is very much made up. Here goes, at my previous company with one of our teams we had been using kanban with story points to help weight issues, have tools to identify capacity, velocity, etc. This was Ok to start but there was definitely a bit of overhead with it. As we started to grow the team, our velocity numbers (which were already somewhat inconsistent) become even more inconsistent because team capacity kept changing, on-boarding new PMs, shifts in vision which meant needing to groom items that were not scored, etc. During this time we started to talk about adopting the throughput model and instead of focusing on grooming with the purpose of sizing, we would focus on grooming with the purpose of defining the smallest unit of work. This was really key to seeing a lot of value from this model, we started to see engineers really focus on the goal and less on the numbers. we also started to look at this weekly metric and can quickly identify patterns and especially notice when bottlenecks or productivity blockers started to affect a team.

Now take a look at this graph (again this is made up data but it's very close to what my team was looking at one summer). We saw a dip in productivity, no engineering led work made it to production, only some bugs and a few feature were delivered which was considerably lower than our average/trend. During the summer we are used to seeing our team slow down due to vacation and time off but this was different, this was a different trend and it spanned multiple weeks, vacation days were definitely not the issue here. So we started digging more and found out that our engineers were spending 3 to 4 times as long working on tests, fixing failures, trying to get the pipelines green with much frustration. An engineer may spend a day working on a feature but not be able to deploy it for 4 days (just to give you another data point, we were deploying an average of 20 times a week so our measure of throughput was for deployed issues to production). In this case, our investigation identified issues with our test framework which had became quite brittle over time. We knew we needed to dedicate some of our engineering capacity to resolving these issues and started to focus on making improvements to the tests and framework. We also improved things quiet a bit so we didn't just cath up on productivity but exceed it.

Understanding our capacity was just one good measure for us to detect when we hit bottlenecks/trouble areas/technical debt slowing us down and also allowed us to see week to week how much we are investing in different areas such as bugs, engineering work and features (Let's leave this split to the next iteration of this model but just to give you a hint, it was also another valuable data point :) ).

A big thanks to @DouweM for reaching out to discuss this MR in more detail and for also being patient as we explore the differences between this model and the current weight model we are using. His recommendation to improve the detail of this issue is the base for the updates I'll be making in the future.

Here's a list of differences between throughput and using weights:

Throughput is a total count of the unit you would like to measure in a time interval. In this proposal I am suggesting we start with issues. Issues need to be the smallest deliverable possible but they are not required to be exactly the same. With weights, you add up issues with 2 or 3 weights and make sure to break down any issues that are 5 or 8. That's a good model and what you end up with is likely an average of 3 weights per issues. So with throughput you still need to get to a well defined small issue but instead of 3 you would get a 1.
Throughput is a measure of work completed and not an estimation. Because it is a true measure of capacity you can rely on it to build a trend for the team and flag problem areas quickly. It can easily translate from team to team when looking at capacity and is more accurate.
Using throughput does not mean that EM should not spend time looking over the issues they are assigning to their team, this is still a valuable exercise, you need to make sure an issue is well defined and the work can be done in a reasonable amount of time. You just don't need to put a weight on it.
Throughput is a lighter model to implement for new members of the team. An EM with a lot of product history who is able to weight and size issues on their own will not see much change in time spent with this model. However, a new EM would need to spend significantly more time if they try to implement the weight model over the throughput model because they need more time estimating each issue and will likely need to involve more engineers on their team.

Why consider implementing throughput?

It's a simpler model to implement.
Goal aligns with our values of iteration (define small units of work).
More accurate measure of result.
Easier for new EMs to implement as it requires less overhead.
Ability to standardize better across teams.

cc/ @DylanGriffith @erushton @plafoucriere @bjk-gitlab @smcgivern @DouweM @marin

Edited Jul 14, 2018 by Dalia Havens

add engr metrics throughput

Merge request reports