Add Handbook monitoring: Status check, availability, TLS
Summary
handbook.gitlab.com is not monitored by any status checks or GitLab.com monitoring
internal.gitlab.com should be monitored, too, but needs an HTTPS with GitLab PAT session.
Context
Historically, about.gitlab.com was not always monitored by the infrastructure team. When handbook.gitlab.com was introduced as new sub domain, I'm not sure that monitoring exists at all.
I cannot find handbook, marketing, about in the dashboards at https://dashboards.gitlab.net/dashboards (via https://handbook.gitlab.com/handbook/engineering/monitoring/)
- 2017, discussion for www-gitlab-com gitlab-com/www-gitlab-com#1569
- 2020, monitoring of deployment jobs that exceed 10 min gitlab-com/www-gitlab-com#7162 (moved)
- 2020 Handbook outage gitlab-com/www-gitlab-com#7897 (closed) with corrective actions to add monitoring gitlab-com/gl-infra/production-engineering#10493 (closed)
- Status: Unclear.
- \> will the alerts appear in #handbook-escalation slack channel?
- Status: Unclear.
- 2022, continuous URL linting and external monitoring checks gitlab-com/www-gitlab-com#13980 (closed) (I gave up on last year without DRIs)
Relevant logs and/or screenshots and/or links to examples
Possible solutions
Boring solution
- Run curl/hurl in a CI/CD job
- In the same project, or docsy-gitlab, or dedicated handbook-tools project
- hurl example: https://about.gitlab.com/blog/2022/12/14/how-to-continously-test-web-apps-apis-with-hurl-and-gitlab-ci-cd/
- Alert the failed job into a Slack alert into #handbook-escalation
More advanced solution
- Add an infrastructure task to run a Blackbox exporter check against the handbook domains
- HTTPS reachability
- Certificate validity
- Access GitLab Ops Grafana dashboards
- Alerts into Slack #handbook-escalation, and email to backend maintainers
Wishlist
- External URL monitoring to check if URLs throw 404, and create a weekly/monthly report.