Monitor and Respond to Very Breached FRT Tickets

Problem Statement

We have multiple day breached FRT tickets that do not get the level of visibility needed to prioritise them.

What is the problem?

For a variety of reasons, we sometimes have new tickets that go multiple days without getting a first response comment made to them.

Why is this a problem?

Having a ticket go multiple days without a response is a very poor customer experience and can lead to escalations (STARs) and frustration for both customers and team members.

Proposal

Create an alert with a list of all new tickets that are older than 2 days with no response. This alert will go to managers who will be responsible for finding an assignee for the ticket and prioritise a response to the ticket.

A Very Breached ticket is any ticket that has gone for more than 2 days with no first response made on it.

The alert is triggered with an attached list of tickets daily.

The alert is made in Slack in either the #spt_gg_forest or #spt_managers channel and [at]mentions all managers globally
OR
The alert is made as a STAR digest (i.e. a single STAR entry with a link to all Very Breached tickets) in the #support_ticket-attention-requests channel and [at]mentions the regional on-call managers

DRI

@shaunmccann will act as the DRI for this issue.

Required Resources

Support Operations help may be needed to automate the list of very breached tickets and implement the alerting solution.

Potential Roadblocks/Things to consider

This issue is focused on FRT tickets only for now and does not including NRT very breached tickets.

This alert is a safety mechanism since we have failed to respond timely once a ticket has breached. There could be many reasons why this fail state has occurred, but the focus for this issue is to introduce a safety mechanism and not dig deeper into the underlying causes of why a breached ticket has gone multiple days without a response. That investigation should take place in another issue.

Once we have deployed and refined this alert, we can use another issue to define a similar approach for NRT tickets. That is out of scope for this issue.

Desired Outcome

Managers are alerted on a regular basis for tickets that are very breached.

What does success look like?

Once the list of very breached tickets has been given to managers, all tickets on the list get an assignee and a first response as a priroity.

How do we measure success?

All tickets included in the alert are assigned and have a first response made within 1 day.

Where would future feedback go?

This issue.

Related Issues/MRs/Epics/Tickets

Originating issue triggered by a Slack notification from SupportOps on very breached tickets: https://gitlab.com/gitlab-com/support/support-team-meta/-/issues/5189