Let's figure out how monitoring works at GitLab
We have a lot of documentation:
- https://gitlab.com/gitlab-com/runbooks/tree/master/monitoring
- https://gitlab.com/gitlab-com/runbooks/blob/master/howto/monitoring-overview.md
- https://about.gitlab.com/handbook/engineering/monitoring/
But none of this describes how monitoring is wired up and configured for GitLab.com. Example stuff that is missing:
- Thanos and it's configuration
- Network peering the fact that our dashboards talk to ONE thanos server that reaches out to the appropriate servers
- How alerting is managed with our various environments
- What trickster does
- And then nothing monitors the monitor, so at times, our dashboards will disappear and we have nothing to tell us that has happened until someone looks at a blank dashboard
- I've personally struggled to look at our various chef roles and cookbooks to connect all the dots, so it would be wise to figure out how things are configured and what components are where
Use this issue to:
-
Come up with an outline of all things monitoring -
Learn this and anything related -
schedule some time with a few team members to present these learnings -
Supplement and update our existing documentation.
Edited by John Skarbek