Identify critical paths, components, or commands of GDK
Overview
Before establishing alerting for paths, components, or commands of GDK, analyse which of them are the most critical. The top 5 most critical paths, components, or commands can serve as the starting point for alerting.
After establishing the first alerts, we will have a documented way of how to add more alerts.
Impacted categories
The following categories relate to this issue:
- gdk-reliability - e.g. When a GDK action fails to complete.
- gdk-usability - e.g. Improvements or suggestions around how the GDK functions.
- gdk-performance - e.g. When a GDK action is slow or times out.
Proposal
For the GDK commands, we can conclude a high-level criticality list:
| command | category | criticality | alerting | metrics/SLI | SLO | comment |
|---|---|---|---|---|---|---|
| install | setup | high | yes | success rate | 80% | |
| reconfigure | setup | high | yes | success rate | 80% | |
| update | managing | high | yes | success rate | 80% | |
| config | settings | medium | no | |||
| doctor | troubleshooting | medium | yes (when number of executions is significantly increasing) | If number of executions goes up, it can be an indicator of users trying to solve problems. | ||
| kill | environment | medium | no | |||
| report | troubleshooting | medium | no | |||
| restart | managing | medium | no | |||
| start | managing | medium | yes | success rate | 80% | |
| status | managing | medium | no | |||
| stop | managing | medium | no | |||
| tail | troubleshooting | medium (not even tracked atm) | no | |||
| cells | component | low | no | |||
| cleanup | environment | low | no | |||
| console | tool | low | no | |||
| debug-info | deprecated | low | no | |||
| diff-config | settings | low | no | |||
| help | tool | low | no | |||
| predictive | tool | low | no | |||
| pristine | environment | low | no | If number of executions goes up, it can be an indicator of users trying to solve problems. | ||
| reset-data | environment | low | no | If number of executions goes up, it can be an indicator of users trying to solve problems. | ||
| sandbox | tool | low | no | |||
| send-telemetry | telemetry | low | no | |||
| switch | tool | low | no | |||
| clickhouse | clickhouse | low (not even tracked atm) | no | |||
| env | settings | low (not even tracked atm) | no | |||
| measure | environment | low (not even tracked atm) | no | |||
| measure-workflow | environment | low (not even tracked atm) | no | |||
| open | tool | low (not even tracked atm) | no | |||
| telemetry | settings | low (not even tracked atm) | no | |||
| psql | tool | low (not even tracked atm) | no | |||
| psql-geo | tool | low (not even tracked atm) | no | |||
| rails | tool | low (not even tracked atm) | no | |||
| rake | tool | low (not even tracked atm) | no | |||
| redis-cli | tool | low (not even tracked atm) | no | |||
| reset-openbao-data | environment | low (not even tracked atm) | no | |||
| reset-praefect-data | environment | low (not even tracked atm) | no | |||
| reset-registry-data | environment | low (not even tracked atm) | no | |||
| import-registry-data | tool | low (not even tracked atm) | no | |||
| truncate-legacy-tables | environment | low (not even tracked atm) | no | |||
| version | managing | low (not even tracked atm) | no |
Edited by Mohga Gamea