Provide enhanced production metrics dashboard

For operations and managers, ensuring key targets are met by your service is important. For example, a company may have SLA's written into agreements for a specific level of uptime. Or, a company may have an internal goal that 99% of all requests are processed within 200ms.

With GitLab and our monitoring capabilities, we can help companies track these important metrics. Initially, the browser performance tests (https://gitlab.com/gitlab-org/gitlab-ee/issues/3046) could effectively act as both a pingdom-like service as well as a way to exercise multiple different parts of an application.

This would provide both a measure of the actual latency as perceived by a browser, as well as whether any requests were being returned. Further, since these are operated on Runners, it would be possible for a company to put these in various parts of the world for even deeper levels of testing.

A few key metrics that could be tracked:

  • Uptime & current SLA
  • p95/p99 response times
  • Error rates
  • Failed authentications
  • Current & Max Sessions
  • Deploys, rollbacks (?)
  • Cluster health / resources
  • Component Status (Red/Yellow/Green)

These could also trend into business metrics as well:

  • Conversion rates
  • Daily Active Users
  • New user registrations
  • Infrastructure cost/scaling
  • etc...

This could serve as an excellent executive dashboard, providing an overview of the health of their service in a few different dimensions. Initially, this could serve as an internal dashboard within GitLab, but some of these build blocks could also power a public facing SaaS Service Status page.

The general work flow could be:

  • Track the uptime over time of /
  • Support additional URL's (https://gitlab.com/gitlab-org/gitlab-ee/issues/3540)
  • Incorporate additional performance metrics like response times, error rates
  • Add support for business metrics
  • Add CI/CD data like deploys, rollbacks
Edited by Kenny Johnston