Staging Canary: Monitor experimental Ruby 3.2 package roll out to gstg-cny
We plan to deploy a Ruby 3.2 package to gstg-cny before the production rollout.
Delivery: Ruby 3.2 rollout to .com and self-man... (gitlab-com/gl-infra&1377 - closed)
gitlab-com/gl-infra/production#18470 (closed)
This issue is to cover the dashboards and metrics we should monitor. Note for this rollout we are only interested in staging canary metrics in this issue.
We should record data as comments in this issue as we check each of the metrics/logs etc.
Key metrics to observe
- Dashboards/metrics:
- Monitor the following dashboards for unhealthy dip in service health for the environment/cluster that is being rolled out.
- Deployment health, configurable with environment, stage, and type/service
- Kubernetes compute resource/cluster health, configurable with clusters
- Kubernetes compute resource/pods health, configurable with clusters and namespace
- Kubernetes networking, configurable with clusters
- Per-service dashboards (change
envandstageto toggle betweengstg/gprdandmain/cny):-
api(overview, containers) -
web(overview, containers) -
websockets(overview, containers) -
git(overview, containers) -
sidekiq(overview, containers)
-
- Kibana - Puma (edit
json.typeto filter by service,json.stageforcnyvsmain) - Kibana - Sidekiq (edit
json.shardto switch between job types) - Sentry
- QA runs can be observed via Slack:
-
#announcements- Besides QA messages, multiple messages are sent to this channel to account for the different deployments. - QA slack channels - There is a channel per environment, for example, a failure on gstg and gstg-cny will be posted in
#qa-staging, a failure on gprd-cny and gprd will be posted in#qa-production, etc.
-
- Dealing with deploy failures: https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/deploy/failures.md
Edited by Paul Phillips