Skip to content

FY21-Q1 Ops Section OKR: Dogfood Monitoring => 70%

Key Results

  • Replicate 10 existing grafana dashboards used by Infrastructure team => 100%, 20 dashboards replicated. Established process for replicating all existing dashboards via CI automation.
  • Identify gaps between grafana and gitlab dashboards needed to support existing Infrastructure workflows => 85%, Some gaps slipped in 12.10 release but continuous deployment to gitlab.com helps mitigate this.
  • Phase out usage of 1 migrated grafana dashboard (after addressing gaps) => 70%, Identified dashboard - replacement version live and MR open to turn off grafana version - GitLab General SLA Dashboard replacing Grafana General SLA Dashboard. MR is open to turn off grafana version.

Notes

The objective is to drive internal usage of features within the Ops section in order to create stronger feedback loops and product validation.

For project details and related issue see epic &281 (closed).

Migrating Dashboards

# Dashboard Name Grafana URL GitLab URL Notes
1. SLA Dashboard https://dashboards.gitlab.net/d/general-slas/general-slas?orgId=1&from=now-1h&to=now https://ops.gitlab.net/gitlab-com/metrics-dogfooding/-/environments/266/metrics?dashboard=.gitlab%2Fdashboards%2Fuptime.yml Identified feature gaps in epic gitlab-org&2541
2. SRE Key Service Metrics https://dashboards.gitlab.net/d/general-service/general-service-platform-metrics?orgId=1 https://ops.gitlab.net/gitlab-com/metrics-dogfooding/-/environments/266/metrics?dashboard=.gitlab%2Fdashboards%2Fkey_service_web.yml Notes on gaps in #6508 (comment 287796125)
3. Key Metrics API Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-api.yml
4. Key Metrics CI Runners Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-ci-runners.yml
5. Key Metrics Frontend Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-frontend.yml
6. Key Metrics Git Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-git.yml
7. Key Metrics Gitaly Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-gitaly.yml
8. Key Metrics Monitoring Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-monitoring.yml
9. Key Metrics NFS Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-nfs.yml
10. Key Metrics Patroni Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-patroni.yml
11. Key Metrics PG Bouncer Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-pgbouncer.yml
12. Key Metrics Praefect Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-praefect.yml
13. Key Metrics Redis Cache Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-redis-cache.yml
14. Key Metrics Redis Sidekiq Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-redis-sidekiq.yml
16. Key Metrics Redis Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-redis.yml
17. Key Metrics Registry Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-registry.yml
18. Key Metrics Sidekiq Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-sidekiq.yml
19. Key Metrics Web Pages Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-web-pages.yml
20. Key Metrics Web Grafana URL https://ops.gitlab.net/gitlab-com/runbooks/-/environments/103/metrics?dashboard=.gitlab%2Fdashboards%2Fkey-metrics-web.yml
21. General Dashboard https://dashboards.gitlab.net/d/general-public-splashscreen/general-gitlab-dashboards?orgId=1 TBD GitLab URL
22. API Overview https://dashboards.gitlab.net/d/api-main/api-overview?orgId=1 TBD GitLab URL
23. Web Overview https://dashboards.gitlab.net/d/web-main/web-overview?orgId=1 TBD GitLab URL
24. Registry Overview https://dashboards.gitlab.net/d/registry-main/registry-overview?orgId=1 TBD GitLab URL

Retro

Good

  • set up a better system for iterating by mirroring dashboards between gitlab.com (deployed daily) and ops.gitlab.net (used to manage gitlab.com)
  • significant progress on dashboard capabilities position us to start removing grafana dashboards
  • the team seems energized by the progress made here.

Bad

  • some late breaking bugs and issues in the 12.10 release pushed the timeline for removing 1st grafana dashboard several weeks.
  • we often describe our solution negatively and are still working to establish more credibility with the infrastructure team

Try

  • focus earlier on rapid feedback cycle, as much of the quarter the team was unable to see dashboard improvements populated with production monitoring data. This meant several issues weren't discovered until late in the quarter (when mirrored deploys of dashboards to gitlab.com began)
  • relationship building with members of the infrastructure team
  • train the team to present monitoring features more effectively and not to imply they will generally be inferior to other monitoring tools
Edited by Cynthia "Arty" Ng