Slack Message Analysis for #infrastructure-lounge
Background
There is a lot going on in the #infrastructure-lounge
channel in Slack. And lots of people read all of these messages, especially SRE's who are keeping tabs on what is happening in the system. When we are averaging 50 messages per week, that is a lot to follow. The purpose of this analysis is to see how we can reduce context switching for team-members who use this channel, and how we can make sure that we are addressing all requests for help in a timely manner.
Notes about the Process
- Asked it-help for an extract of the channel. They gave me the Slack json file for the channel which included all messages within the retention period.
- Pulled this into a Google Sheet https://docs.google.com/spreadsheets/d/1SkN__S3ojaEc_e_s9AbwB9S90MSPGDvN5EZXT49aMpI/edit
- Got this into Tableau
- Categorized the messages (very roughly to start) - 72 unclassified because my eyes couldn't do it anymore!
Findings
1. Putting teleport in it's own channel has cut down the message rate significantly
I did notice other automation and confirmed with Jarv that these are guards against applying changes during the change lock periods.
Actions:
-
Find a different home for the Change Lock Messages
2. Zoekt
8 out of the past 12 weeks have included messages about Zoekt. These messages are asking for approvals on indexer version bumps, replica count bumps, image changes, webserver memory, etc.
Recommendation:
-
Can we enable them to self-serve?
3. Runners
On the average week we will get at least one question about Runners. Once a month there is a request for someone to pause or un-pause the gitlab-org-qa-runner
.
Recommendation:
-
Find out why SREs are needed to work on the gitlab-org-qa-runner
. These messages are always from@svistas
.
4. Observability
At least one question a week and often more. The questions range from access, troubleshooting to feature requests. Ideally these would all be going to the Observability team.
Recommendation:
-
Figure out how to route these questions out of the lounge and to the Observability team.
5. Customer Support
The most frequent messagers are the Customer Support group. The questions are all varied and are seldom quick answers. This group would benefit from having their own way to interact with production engineers and having the messages in their own space might make it easier for them to search for previous answers. This is also one of the easiest groups to train to use a specific channel or specific method.
Recommendation:
-
Create a separate channel for customer support queries for Infrastructure and use the it-help bot method where every message turns into an issue. This will ensure that no questions are missed, and creates a dataset of questions/answers.
6. Change Requests
Change Requests weren't as frequent as I thought, but still regular occurances each month. Between this, rails console requests and the requests to change admin settings, this feels like it's masking some missing application features.
Recommendation:
-
We should gather more data about change requests to categorise them. Are they missing application features? Are they direct data manipulation? Suggestion is to update the template to require specific labels.
7. Technical Support
There is a huge bucket of requests that are technical support. This is going to need it's own analysis. So in the spirit of iteration, I'll do that separately and add it here when I'm done.