Define and Document Provision project management and planning process

Background

Having consistent processes across teams is a positive in general, it allows consistency and ease of interoperating between teams, but that doesn't always best address the specific needs of a given team. https://about.gitlab.com/handbook/engineering/development/fulfillment/#project-management-process provides a good starting point for fulfilment team processes, this issue aims to build on that to find a process that works best for the groupprovision team.

Specific Areas to address

Refinement

Foundational to any capacity planning is having estimates for how long issues will take to complete. The provision team doesn't currently have a formalized process for ensuring that all scheduled issues are weighed and refined before the start of the milestone.

Proposed Refinement Process

See https://gitlab.com/gitlab-org/fulfillment-meta/-/issues/1127.

Managing operational overhead

Every team deals with some overhead to some degree, in the form of code reviews, on call shifts, cross team questions, and urgent requests. However due to Provision's interactions with financial there are additional compliance requirements that require ongoing triage of problems. These processes take time and reduce capacity, we should account for this reduced capacity in our planning process and document that reduction for future planning issues.

We have good existing documentation for what these processes are (https://gitlab.com/gitlab-org/customers-gitlab-com/-/blob/main/doc/provision_tracking_system/failure_monitoring.md, https://gitlab.com/gitlab-org/customers-gitlab-com/-/blob/main/doc/process/salesforce_and_zuora_sentry_issue_monitor.md), what would be helpful here is documenting how those process tie into capacity planning, and how that impacts team capacity.

Proposed Process for managing overhead

For assigned processes, Provision tracking system or Sentry monitoring, we'll want to determine what a reasonable reduction in development capacity is based on those tasks and then account for that in our planning. With Sentry monitoring assigned monthly and tracked with an issue, the expected weight of that process can be assigned to that issue and then managed normally as part of the planning process. For provision tracking the assignments are shorter, spread across the team, and not tied back to a specific issue, for time spent on that a flat amount of weight could be assigned per developer per week assigned, and then deducted from expected throughput during the capacity planning portion of the planning process, similar to how PTO is handled.

For the specific weights to assign to these tasks we'll want to gather more data to get better estimates. However as a starting point we could assign Sentry monitoring a weight of 3 and weight of 1 to each week of provision tracking.

Ensuring there is capacity and coverage for planned issues

The other side of capacity planning is ensuring that we have people available to do the planned work. The general process for planning team capacity treats capacity as a broad pool interchangeable of frontend and backend capacity. But that doesn't address the reality of subject matter expertise and experience with the systems involved.

We want to ensure that when an issue is put into a milestone there's a reasonably good chance of it being completed. Unexpected things will happen, so there's no guarantees that everything will go to plan, but we should still put forth our best methods and plans to get as much done as possible. To support that we should find a method that works for the team of having a DRI assigned for every issue put into the milestone plan.

Proposed Process for issue assignment

See https://gitlab.com/gitlab-org/fulfillment-meta/-/issues/1469.

Experiments

Data gathering around time spent on issue and process vs weights

In recent months we have picked up a number of new processes, and have had a consistent roll over of issues at the end of the milestone. This suggests that there is something amiss with our estimation process. To try to narrow down where the problem is for the %16.4 milestone it has been requested that the team keep track of how many days and for about how long they work on each issue throughout the milestone, as well as note the time and frequency of disruptions from processes not captured in working on issues. A minimal set of columns/attributes to record for this would be task, date, and approximate duration in hours.

The goal here is to sum up the time spent on issues over the course of the month and compare that with the projected weight and how many hours of work would be expected based on that. As well as identifying major time sinks in other processes that we may need to account for when estimating monthly capacity.

Pre-assignment of milestone issues

For the %16.4 milestone all of the issues in the milestone were pre assigned to engineers. Assignments were made based on who had historically interacted with the issues or systems those issues were about. This was a last minute decision made on the day before the start of the mile stone without feedback from team members. These shortcomings hampered the value of the initial attempt at implementing a method for pre-assigning issues.

This however did spawn the discussion that lead to https://gitlab.com/gitlab-org/fulfillment-meta/-/issues/1469+. With a bit of refinement the ideas there should help us find a more workable process that works for the team.

Edited Aug 24, 2023 by Isabel Sandin