Basic clone observability
Goal
To detect dangerous DDL and slow queries and describe reasons for it, DBLab should provide more data about DB migrations and EXPLAIN plans
TODO / How to implement
-
Add an additional mount point to store observation artifacts - 2h
-
Define artifacts structure and provide artifact management - (6-8h)
/observer/ clone_id/ session_id/ pg_stat_statements/ statements.csv pg_stats/ pg_stats_database.csv pg_stat_bgwriter.csv pg_stat_user_tables.csv objects_size.csv summary.json
summary.json
{ "session_id": 1, "clone_id": "c0nta38hmvj68nrgn7jg", "duration": { "total": "35s", "started_at": "2021-02-19T15:11:23.508Z", "finished_at": "2021-02-19T15:11:59.106167994Z", "max_query": "239ms" }, "db_size": { "total": "7.7 MiB", "diff": "10 B", "objects_stat": { "count": 62, "row_estimate_sum": 23232, "total_size_bytes_sum": 8101888, "table_size_bytes_sum": 4218880, "indexes_size_bytes_sum": 3244032, "toast_size_bytes_sum": 638976 } }, "locks": { "total_interval": 4, "warning_interval": 0 }, "stats": {} }
-
Collect data during observation sessions and store observation results as files - (14-18h)
- dangerous locks detected (consider as an artifact)
- duration
- duration in Lab
- estimated duration for prod
- pg_stat_statements - as an “amount of work” - buffers read/hit/dirtied/written
- pg_stat_*** standard views:
- op/tuple stats
- pg_stat_bgwriter (checkpointer, bgwriter, backend buffer stats)
- WAL written
- DB and its objects sizes and changes (similarly to L001 in postgres-checkup)
- dead tuples and bloat: should we analyze them?
- autovacuum settings – so far we didn't take care of them! (match prod)
- summary: ???? (what should be there - discuss and decide)
-
Serve results via DLE API - (8-10h)
- GET
/observation/results/{clone_id}/{session_id}
- returns a session summary - GET
/observation/results/{clone_id}/{session_id}/list
- returns a list of session result files - GET
/observation/results/{clone_id}/{session_id}/file?path=pg_stat_activity/activity.csv
- download a session result file
- GET
-
Clean up artifacts when destroying a clone - 2h