Automated data retrieval (declarative database lab initialization)
Goal
TODO / How to implement
Config example
retrieval:
stages:
- initialize
spec:
initialize:
jobs:
# - name: logical-restore
# options:
# dumpFile: /tmp/db.dump
# forceInit: false
# dbName: test
# partial:
# tables:
# - test
- name: physical-restore
options:
tool: walg
dockerImage: "postgresai/sync-instance:12"
envs:
WALG_GS_PREFIX: "gs://{BUCKET}/{SCOPE}"
walg:
storage: gcs
backupName: LATEST
credentialsFile: /tmp/sa.json # optional
---- OLD Implementation:
-
Stage interface (data retrieval, promotion, mask, etc) 5h - Clones/snapshot usage
- Docker container provision
- Each stage running in separate container
-
Pipeline: -
[dump/restore | WAL-G | barman -> PGDATA] -> [PGDATA master/replica -> master -> remove PII -> snapshot]
-
[import] -> [promote] -> [mask]
-
Stage 1: [import]
- Configuration 2h
- snapshot TTL
- mode
- dockerImage
- dump/restore
- connection params
- plain-text, directory, custom (-Fc -Fd)
- Experiments/Preparations 6h
- Massive diff (!)
- Delete previous clone snapshots
- Massive diff (!)
- Logic 6h
- dump/restore
- set statement timeout to 0
- Configuration 2h
-
Stage 2: [promote]
- Can be optional. As we want to give SRE an ability to manually promote their clone.
- Configuration 1h
- dockerImage
- Logic 6h
-
Stage 3: [mask] OUT-OF-SCOPE
-
-
Pipeline scheduling (run data retrieval on interval) 4h
Documentation:
-
Notify users about autovacuum pause 1h -
Pipelines docs 6h - Configs
- How it works
- How to extend