Iteration on triage process for issues that lack clear ownership by a group
Background
When certain issues arise that do not squarely fall within the focus area of a development group (i.e. the author of the issue feels uncertain about which group would be responsible to address it), we currently default to either leaving the group label unassigned or assigning the issue to the label group::not owned
.
The label group::not owned
was originally intended as a way to signal that these issues are not directly owned by any group and that it is a shared responsibility to address them by any one group that is seen as fit.
However, there isn't a clear process that ensures these issues are assigned to a group for prioritization and execution. This means that those issues (especially the ones assigned to group::not owned
) fall into a backlog void and just linger there without a clear owner or a way to move forward.
There have been multiple previous conversations pointing at this problem and attempts to address it but without a concrete proposal or one that addresses the problem holistically. There was, however, a previous agreement to remove the group::not owned
label as it is seen as an intermediate status that serves little purpose and creates two tiers of untriaged issues (i.e. those without any group label and those with this label) adding to the confusion and complicating triage ops automation.
For context here are some previous issues and MRs that touch on the topic:
- Develop process for triaging and prioritizing work that is not within a group
- Delete group::not_owned label
- Add concept of facilitation ownership (replace not_owned concept), and apply to 3 categories
- Group attribution for shared items
- Proposal for a Platform/Core team in Development
- Remove testcases default labeling
Current state of affairs
Here are a few points to consider on where things sit at the moment:
- There is already a notion of 'Facilitated Product Areas' in the handbook, which speaks to this concept of shared responsibility where any team can (and arguably should) contribute to those areas. The guideline is that a group should be chosen, instead of assigning the
group::not owned
label. - The Tanuki-Stan project is running, as a triage automation, to assign a group label (based on a machine learning model) to issues that lack one 24 hours after creation.
- The quality department already has a process to pick-up untriaged issues (including those missing a group and section label). Part of this process is to assign a group so they meet the partially triaged criteria.
- The label
group::not owned
has been removed from test case issues by the quality department. - The
group::not owned
label continues to be used by issue authors and the backlog of these issues is growing.
Proposed approach
The approach to this problem is multipronged, using a combination of handbook updates to provide clarity on the process and automation to monitor and close the gaps by deviations from the process.
The approach is broken down into Tasks in this issue, and it is based on the following premises:
- We need to build a generalized understanding that there are areas of GitLab that are a shared responsibility and that they need collaboration from multiple teams to agree on prioritization and assignment to specific groups for execution (even when it seems out of their primary focus area)
- Authors need clarity on what is expected of them in terms of the timing for assigning a group label as well as some guidance to choose a group according to their areas of focus.
- Group assignments for these shared areas are not necessarily final, but a way to trigger triage conversations (and escalations) in a scalable way across the organization.
- Automation for label assignment and triage reporting are valuable aids but they are not perfect, nor a replacement for individual responsibility to triage issues by the collective group leadership.
The high-level strategy of this proposal is to:
- Device a de-centralized triage process for issues when assignees think that the group assignment was done incorrectly. This should include a process to escalate the conversation from the EM/PM group level to the directorship level for the more difficult conversations.
- Update handbook documentation to educate stakeholders on the shared responsibility concept, the decentralized triage process and the existing breakdown of focus areas for each team.
- Develop automation to remind authors about group label assignments when missing on issue creation, including links to help docs
- Update message posted by Tanuki-Stan when it auto-assigns a group label, to point to the new documentation on shared responsibility and triage process
- Remove the
group::not owned
label from the project to force specific group assignments by authors and/or quality engineering managers doing triage in new issues moving forward - Remove the
group::not owned
label from existing issues and let the Tabuki-Stan automation or the quality department triage process to make group assignments as per the new process - Communicate this change broadly across the development department starting with directors and then broadening to the entire organization.
Relevant MRs
List of MRs that we create to address this issue:
- Introduce shared responsibility issues and a decentralized triage process
- Update (Tanuki-Stan) issue note with info on shared responsibility triage process
- Add references to shared services components docs to the triage process
Out of scope
This issue and related tasks and MRs does not attempt to address the need to increase coverage by assigning specific groups to explicitly own those components that fall in the category of "shared responsibility" as defined in this issue. This is a good topic to discuss and iterate on, but for the time being, it is out of scope.