"Spike: Implement SLIs for Service Desk"
See gitlab-com/gl-infra/production#3267 (closed)
In December a regression stopped outbound Service Desk email on .com for a period of almost 8 days. This went unnoticed because correspondents had to realize they simply weren't being ignored by Service Desk administrators before reporting an issue.
We should implement some simple SLIs that will alert us when deliverability decreases in a measurable way. In this case identifying the worker that sends email on behalf of Service Desk might not have been enough, as it still worked the same number of jobs. We should measure outbound email, especially if we can partition it for Service Desk itself.
Proposal
- Identify logs for workers that send Service Desk email;
- Identify existing tracking of email deliverability;
- Check logs for December to measure outbound email;
- Work with infrastructure to produce an SLI that alerts on decreases in deliverability.
TODO:
[ ] check the reason of incoming/outgoing emails event mismatch (gitlab-com/runbooks!3650 (comment 599337810))
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.