What worked for us this month to drive execution in August
On the Monitor Health group, our throughput was VERY consistent through August. From the week of July 29 to the week of August 26 our throughput numbers were: 9, 11, 10, 10, 10. This is fantastic in terms of leveling out our effort through the month instead of having a high-pressure rush to get everything merged the week of feature freeze like we used to. I think this is due to our new continuous delivery process for gitlab.com and to the better, more consistent backlog grooming we are doing in our weekly meetings.
What did not work well
Our overall throughput number was down in August (41) compared to July (49). It is hard to pinpoint a reason for this, but some factors may have been that we were onboarding two new backend engineers and one new frontend engineer and a lot of our work in the earlier part of August was investigative work for larger issues that didn't lead directly to MRs.
What we can improve moving into September
Hopefully the investigative work we did in August will help us gain momentum in September. Our engineers who started in the end of July and beginning of August are now ramping up quickly and that should also be reflected in our throughput. We will have one more backend engineer joining midway through September, so that may push our throughput down a bit.
Another thing to note is that for %12.3 we have more spikes (which don't lead to merged MRs as outcome), I am not certain we will see a higher throughput for September because of that. Something to consider.
Not sure I understand the spikes concern. Ideally this would level out the following week. We are not measuring release to release. Also we have a seen a general trend of leveling.
For monitor health, we have more spike (PoC investigations) that are quite large in size. As a result, engineers may spend more time investigating rather than having MRs merged into the codebase (which would indirectly decrease our teams throughput). A spike here and there shouldn't have a big impact but this release we have 3 spikes
What worked for us this month to drive execution in August
Having a backend maintainer means it's easier for us to get important things through to merged quickly
What did not work well
Several issues that required deep analysis before we could ship anything
Not necessarily a bad thing but we're taking on a new domain for us with EKS support so that involves gaining familiarity with a lot of new tooling and APIs in order ship minimal features
What we can improve moving into September
Hoping that September issues end up being more straightforward due to investigation done in August
Hoping we can avoid getting stuck on large issues with large time to first merge. Breaking things down and merging quicker might help with productivity just by motivation.
2/3 of FE team members away for a week at a conference
Increase in leave taken due to public holidays and general holiday time for US region)
(PS: The items listed in what did not work well are good things, which we should do and encourage, I'm merely listing it here as contributing factors to a lower throughput for Aug.)
@jeanduplessis I want to make sure we are clear that going to the items you listed as you said are good things and encourage. I am updating our template to include a new category: Notes where we can make highlights such as the ones you listed as well as other impacting factors such as on-boarding new engineers.
I still want us to be critical of what did not work well such as long review cycles, impact from long spikes/investigative work, etc. These are things we can work to improve.
Perhaps the section question can be something like "What negatively impacted execution and/or throughput?" - that way we're not making a value judgement by default.
Finding reviewers/codeowners for some smaller projects has been difficult and made merging slow. We created this handbook MR with more details to track: !29057 (merged)
The APM team as a whole had high variation in the number of MRs week to week from a high of 16 to a low of 3.
What we can improve moving into September
Doing more investigation on issues before each milestone so we are sure they are ready for work when we start working on them.
Notes
The backend APM team did have some extra vacation time in August, about 10% of our time was vacation.
The APM backend team has 2 new engineers planning to start at the end of September, which could impact September and October, but will hopefully make for strong months going forward.
What worked for us this month to drive execution in August
We have agreed on the APM Planning board as our SSOT for prioritized work. This accomplished a number good outcomes:
Sparked conversations in our 1:1s prompting ownership by engineers over the issues they want to work on while staying in alignment with product's priorities
Allowed us to get a jump start on future milestone work when we encountered wait times due to blockers.
What did not work well
We are still digging our way out of the varying issues arising from different permutations of enabled/disabled feature flags.
What we can improve moving into September
We should prioritize some time to implement end-to-end (and ideally snapshot) tests for the metrics and cluster health dashboards. This would help us ensure we aren't breaking functionality as we continue to add new features and keep us aware of our blind spots as we introduce potentially breaking changes. It would also help prevent us from merging changes behind a feature flag that would break the dashboard when enabled.
Notes
Frontend had at least 17% of our time spent on vacation, plus another 8% of partial capacity due to JSConf.
For the Monitor:APM team I looked through the list of MRs for July and August to find how many were duplicated to apply the change to both CE and EE:
Month
Total MRs
CE/EE Dups
Unique Changes
July
41
7
34
August
42
7
35
Now that we have a single codebase, I would expect the total number of MRs to be closer to that "Unique Changes" count. For example, in September we had 33 MRs, which is close to the number of unique MRs from previous months.
That said, there are other factors as well we should consider. I think we be more productive going forward since we don't have the process overhead of opening duplicate MRs.
Worth noting for Configure in September the Frontend team switched to labelling certain MRs as devops::configure only without using the group:: label. This is because these issues are not related to either of the teams and are part of a global optimization effort. Additionally since the FE team were working across both groups this also made more sense than arbitrarily choosing a group. This apparently corresponded to 11 MRs that in previous months would have been added to the Configure Orchcestration total.
After discussion in a 1:1 yesterday I've decided to ditch the separate labeling process we have in Configure FE and align it with the group throughput labelling.
Therefore going forward (and I'll try to update MRs retrospectively for 12.3) the team members will use their respective group label for MRs, regardless of whether they do stage work or not.
Onboarding, delays in merge time were contributing factors
We did not hit our FY20-Q3 goal, we achieved ~90% of the 0% goal
In FY20-Q4 we're focussing on more clear guidance to engineers on breaking up work into smaller Issues and MRs, coaching on this front as well as introducing various methods to provide more opportunity for team members to execute on small deliverables to drive up throughput.