Database Lab - Telemetry
As a Database Lab user I want to be able to collect performance metrics about a clone (e.g. for verifying of database migrations).
Implement basic support with following artefacts collection:
- Telemetry collection duration;
- Locks monitoring.
- CI/CD: how can we understand if it fails?
TODO / How to implement
Original idea: New client CLI commands:
dblab telemetry start- starts collection of metrics (deletes previously collected artefacts);
dblab telemetry stop- stops collection of metrics, interval services, executes additional scripts;
dblab telemetry status- gets short summary about migration status;
dblab telemetry download- downloads artefacts from Database Lab machine to a user's machine.
Implement corresponding API handles.
dblab clone create ... dblab telemetry start sqitch deploy dblab telemetry stop dblab telemetry download
Suggestions from standup call:
- Use Prometheus format for export of time series data
dblab clone observe -f CLONE_ID
- Collect telemetry and expose it in Prometheus format or use https://github.com/wrouesnel/postgres_exporter.
- Collect telemetry all the time for all clones.
dblab clone observe -f CLONE_IDto retrieve current telemetry for the clone.
dblab clone start-observe CLONE_IDand
dblab clone stop-observe CLONE_IDto get aggregated statistics about duration and locks and the end of a DB migration test.
- Execute monitoring queries from Database Lab CLI itself, do not change server or its API.
dblab clone observe -f CLONE_IDto start observing and
dblab clone stop-observe CLONE_IDto stop observing and show aggregated metrics.