Pre-release long shortlist
Major
-
❗ cloning is not working -- path issues -
get rid of stages
concept, use onlyjobs
-
❗ physical-restore: do not complain if PGDATA is not empty, try to launch, skipping pg_basebackup / backup-fetch- sync instance has to be present even if we skipped PGDATA fetching
-
dataDir
andpgDataSubdir
– very confusing. Right nowpgDataSubdir
is a substring ofdataDir
, it's hard to understand, hart to describe and configure.- Idea: define
mountPoint
(describing that it's how disk partition is mounted), and next to it,pgDataSubdir
-- as an addition, not as substring -
mountPoint
andpgDataSubdir
next to each other, in the same section
- Idea: define
-
❗ Lack of automated tests helping to stabilize in terms of bugs and lacking features, quality- While full-features tests in CI are not working now (because of
--privileged
; we need custom CI runners), we still can use automated tests, shell scripts -- and run them when needed manually. !143 (closed)
- While full-features tests in CI are not working now (because of
-
Difficult to troubleshoot -- lack of clarity in error messages -
Specify which container, which directory, etc. Examples: Container is not ready yet. The current state is starting. ... [FATAL] Failed to run the data retrieval service: the data directory is not empty. Use 'forceInit' or empty the data directory
-
If something fails in a secondary container (such as "retriever"), and then it's got deleted, there is no way to understand what happened -- container's log doesn't exist anymore. An example: 2020/08/13 14:36:53 [INFO] Run job: logical-dump. Options: {/var/lib/dblab/db.dump postgres:11-alpine { 0 } {remote {nik-fivetran-test4.cpawoeaiqdwq.us-east-2.rds.amazonaws.com 5432 test3 postgres kDNJGMtguh3PWKihRyPT} <nil>} {[]} 2 <nil>} 2020/08/13 14:36:57 [INFO] Running container: retrieval_logical_dump. ID: 57c0423b6e80ee974574aa00643034d0f5b1930e4129337d73f60d241dc7f04d 2020/08/13 14:36:58 [INFO] Stopping container ID: 57c0423b6e80ee974574aa00643034d0f5b1930e4129337d73f60d241dc7f04d 2020/08/13 14:36:59 [INFO] Stop container ID: 57c0423b6e80ee974574aa00643034d0f5b1930e4129337d73f60d241dc7f04d 2020/08/13 14:36:59 [FATAL] Failed to run the data retrieval service: failed to readiness check: container health check failed
-
-
dockerImage
-- 4 of them is too much- Ideally, only 1 should be used. We can put WAL-G into our "extended" image. And then pgBackRest, and others -- the binaries don't take too much disk space (discussed with @fomin.list)
-
❗ confusion with paths. Currently, we have path to PGDATA on the host machine, plus internal path in dblab container, and we expect that it can be used when running Postgres containers. But obviously, the path configured in config is not considered as internal -- in fact, it's path on the host machine.- consider using
--volumes-from
(https://docs.docker.com/engine/reference/commandline/run/#mount-volumes-from-container---volumes-from; https://www.ionos.com/community/server-cloud-infrastructure/docker/understanding-and-managing-docker-container-volumes/). Slack discussion: https://postgres-ai.slack.com/archives/CSXS2JV6W/p1597333212402900
- consider using
-
❗ images used internally must be automatically pulled. Right now, we have an error and need to dodocker pull
manually -
❗ pg_dump & pg_restore, various issues-
-Fc
->-Fd
-
confusing behaviour: for single-thread piped version, pg_restore
has option to skip owners and privileges. While separatepg_restore
doesn't- Add these options always. Alternatively, allow specifying any options for both
pg_dump
andpg_restore
, providing freedom
- Add these options always. Alternatively, allow specifying any options for both
-
-j
is not working unless it's1
-
vacuum analyze
needs to inherit-j
from the restore configuration (vacuumdb --analyze -j
; and we don't needfreeze
) #143 (closed)
-
Misc
-
Do not use static names for containers -- add some hash or prefix/namespace or allow configuring it, to be able to run multiple database lab instances on the same machine -
What can we remove from configs completely? -
cloning.mode
-
provision.mode
-
pgMgmtUsername
– do we need it?
-
-
PGDATA dir (or future PGDATA dir) permissions: must be 0700 to avoid errors. - We need either to do it automatically or describe it well in config examples
-
if something failed, "retriever" container is left running (is it really so? maybe it's because of the --restart on-failure
option) -
if something failed, .dblab
subdirectory is left inside PGDATA and we need manually clean it to be able to retry -
config locations are confusing - Improvement proposed #151 (closed)
-
PostgreSQL config – always start with a default postgresql.conf
!137 (merged) -
Graceful shutdown (being default): stop all depending containers
!135 (merged))
More config items (inherited from-
local
mode is difficult to understand. Do we need to keep it?-
If we keep it, we need to describe it well. It's used when we use --network
when running the container and port 5432 is busy on the host machine. Withlocal
another port will be chosen automatically. -
Another idea: get rid of local
. Never use 5432. Allow configuring all ports. For clones, never use 5432 as well, even inside containers. Use 6000-6999 inside and map 6000:6000, 6001:6001 (see !127 (closed)) - this will have benefit thatshow port
will show correct port for Postgres, useful for external users. As for sync instance, allow configuring port, and do not use 5432 by default.
-
-
review "retrieval" part in configs @NikolayS -
proofread all the texts in configs -
configs: provide docker run
containers -
reduce the number of config sections -
move debug
inside theglobal
section -
global.dataDir
:-
rename to mountPoint
or smth like this. The idea is to avoid duplicating part of of the path (now,provision.pgDataSubdir
is suffix ofglobal.dataDir
- this looks weird and not intuitive) -
[duplicate] also, where should it be -- in provision
, andretrieval
should be also part ofprovision
?
-
-
provision.pgDataSubdir
-
get rid of leading slashes. When concatenating, add slash automatically. Otherwise, it looks weird -- it starts with /
, so it looks like absolute path -
so, default value becomes an empty string, not '/'
-
-
retrieval
butprovision
– a noun vs a verb. Not really consistent -
idea: always use double-quoted values in the example configs? -
[duplicate] do we need provision.pgMgmtUsername
at all? -
explicitly use the word "clone" in provision.mountDir
-- to emphasize that this option deals with clones. -
provision.snapshotFilterSuffix
is a "negative" filter but it's not clear from the name. Use the word "exclude" -
global.dockerImage
andprovision.dockerImage
:-
we need just one. Two -- too much. And by default, it should be an extended image from us. -
and there is one more dockerImage
, the 3rd, in logical retrieval jobs description -- It's too much.
-
-
for logical-dump
job, use "immediate" word inrestore
to indicate that restoration will happen immediately, without saving the dump file on disk. -
for physical-restore
, fixwalg
storage options. It almost doesn't matter for cloud storages except for GCS -
merge retrieval
toprovision
-
add parallelJobs
in thr "restore" section, defaulting to 2
Triage
-
Add reference for dblab container types that dblab creates, e.g. dblab_phr -
(suggestion) Add environment variables based on dblab config which will be accesible to custom tool, e.g. command: "pg_basebackup -X stream -D $DATADIR" -
Hide docker pull verbose output progress -
Dblab performance degradation related to docker commands (checkpoint?) due to updates or AWS problem? -
Log custom command on "running restore command" -
Add to custom tool section of the config message: "Write your data to datadir defined in config" -
For physical job define password with envs like we do for logical job (currently pass for the physical job can only be set in the config) -
Sync instance support for custom tool is undetermined. -
Remove logger in clean up snapshots commands.
Edited by Artyom Kartasov