Database Lab - Telemetry

Goal

As a Database Lab user I want to be able to collect performance metrics about a clone (e.g. for verifying of database migrations).

Implement basic support with following artefacts collection:

Original idea: New client CLI commands:

dblab telemetry start - starts collection of metrics (deletes previously collected artefacts);
dblab telemetry stop - stops collection of metrics, interval services, executes additional scripts;
dblab telemetry status - gets short summary about migration status;
dblab telemetry download - downloads artefacts from Database Lab machine to a user's machine.

Implement corresponding API handles.

Usage:

dblab clone create ...
dblab telemetry start

sqitch  deploy

dblab telemetry stop
dblab telemetry download

Suggestions from standup call:

Improved idea:

Collect telemetry and expose it in Prometheus format or use https://github.com/wrouesnel/postgres_exporter.
Collect telemetry all the time for all clones.
Use dblab clone observe -f CLONE_ID to retrieve current telemetry for the clone.
Use dblab clone start-observe CLONE_ID and dblab clone stop-observe CLONE_ID to get aggregated statistics about duration and locks and the end of a DB migration test.

Fastest implementation:

Execute monitoring queries from Database Lab CLI itself, do not change server or its API.
Use dblab clone observe -f CLONE_ID to start observing and dblab clone stop-observe CLONE_ID to stop observing and show aggregated metrics.

Edited Apr 01, 2020 by Anatoly Stansler