Internal Ops Dashboard
For operations and managers, ensuring key targets are met by your service is important. For example, a company may have SLA's written into agreements for a specific level of uptime. Or, a company may have an internal goal that 99% of all requests are processed within 200ms.
With GitLab and our monitoring capabilities, we can help companies track these important metrics. Initially, the browser performance tests (#3046 (closed)) could effectively act as both a pingdom-like service as well as a way to exercise multiple different parts of an application.
This would provide both a measure of the actual latency as perceived by a browser, as well as whether any requests were being returned. Further, since these are operated on Runners, it would be possible for a company to put these in various parts of the world for even deeper levels of testing.
A few key metrics that could be tracked:
- Uptime & current SLA
- p95/p99 response times
- Error rates
- Failed authentications
- Current & Max Sessions
- Deploys, rollbacks (?)
- Cluster health / resources
- Component Status (Red/Yellow/Green)
These could also trend into business metrics as well:
- Conversation rates
- Daily Active Users
- New user registrations
- Infrastructure cost/scaling
This could serve as an excellent executive dashboard, providing an overview of the health of their service in a few different dimensions. Initially, this could serve as an internal dashboard within GitLab, but some of these build blocks could also power a public facing SaaS Service Status page.
The general work flow could be:
Track the uptime over time of
- Support additional URL's (#3540)
- Incorporate additional performance metrics like response times, error rates
- Add support for business metrics
- Add CI/CD data like deploys, rollbacks