Pre-release long shortlist

Major

❗ cloning is not working -- path issues
get rid of stages concept, use only jobs
❗ physical-restore: do not complain if PGDATA is not empty, try to launch, skipping pg_basebackup / backup-fetch
- sync instance has to be present even if we skipped PGDATA fetching
dataDir and pgDataSubdir – very confusing. Right now pgDataSubdir is a substring of dataDir, it's hard to understand, hart to describe and configure.
- Idea: define mountPoint (describing that it's how disk partition is mounted), and next to it, pgDataSubdir -- as an addition, not as substring
- mountPoint and pgDataSubdir next to each other, in the same section
❗ Lack of automated tests helping to stabilize in terms of bugs and lacking features, quality
- While full-features tests in CI are not working now (because of --privileged; we need custom CI runners), we still can use automated tests, shell scripts -- and run them when needed manually. !143 (closed)

Difficult to troubleshoot -- lack of clarity in error messages

Specify which container, which directory, etc. Examples:

Container is not ready yet. The current state is starting.
...
[FATAL]  Failed to run the data retrieval service: the data directory is not empty. Use 'forceInit' or empty the data directory

If something fails in a secondary container (such as "retriever"), and then it's got deleted, there is no way to understand what happened -- container's log doesn't exist anymore. An example:

2020/08/13 14:36:53 [INFO]   Run job: logical-dump. Options: {/var/lib/dblab/db.dump postgres:11-alpine { 0   } {remote {nik-fivetran-test4.cpawoeaiqdwq.us-east-2.rds.amazonaws.com 5432 test3 postgres kDNJGMtguh3PWKihRyPT} <nil>} {[]} 2 <nil>}
2020/08/13 14:36:57 [INFO]   Running container: retrieval_logical_dump. ID: 57c0423b6e80ee974574aa00643034d0f5b1930e4129337d73f60d241dc7f04d
2020/08/13 14:36:58 [INFO]   Stopping container ID: 57c0423b6e80ee974574aa00643034d0f5b1930e4129337d73f60d241dc7f04d
2020/08/13 14:36:59 [INFO]   Stop container ID: 57c0423b6e80ee974574aa00643034d0f5b1930e4129337d73f60d241dc7f04d
2020/08/13 14:36:59 [FATAL]  Failed to run the data retrieval service: failed to readiness check: container health check failed

dockerImage -- 4 of them is too much
- Ideally, only 1 should be used. We can put WAL-G into our "extended" image. And then pgBackRest, and others -- the binaries don't take too much disk space (discussed with @fomin.list)
❗ confusion with paths. Currently, we have path to PGDATA on the host machine, plus internal path in dblab container, and we expect that it can be used when running Postgres containers. But obviously, the path configured in config is not considered as internal -- in fact, it's path on the host machine.
- consider using --volumes-from (https://docs.docker.com/engine/reference/commandline/run/#mount-volumes-from-container---volumes-from; https://www.ionos.com/community/server-cloud-infrastructure/docker/understanding-and-managing-docker-container-volumes/). Slack discussion: https://postgres-ai.slack.com/archives/CSXS2JV6W/p1597333212402900
❗ images used internally must be automatically pulled. Right now, we have an error and need to do docker pull manually
❗ pg_dump & pg_restore, various issues
- -Fc -> -Fd
- confusing behaviour: for single-thread piped version, pg_restore has option to skip owners and privileges. While separate pg_restore doesn't
  - Add these options always. Alternatively, allow specifying any options for both pg_dump and pg_restore, providing freedom
- -j is not working unless it's 1
- vacuum analyze needs to inherit -j from the restore configuration (vacuumdb --analyze -j; and we don't need freeze) #143 (closed)

Misc

More config items (inherited from !135 (merged))

Triage

Add reference for dblab container types that dblab creates, e.g. dblab_phr
(suggestion) Add environment variables based on dblab config which will be accesible to custom tool, e.g. command: "pg_basebackup -X stream -D $DATADIR"
Hide docker pull verbose output progress
Dblab performance degradation related to docker commands (checkpoint?) due to updates or AWS problem?
Log custom command on "running restore command"
Add to custom tool section of the config message: "Write your data to datadir defined in config"
For physical job define password with envs like we do for logical job (currently pass for the physical job can only be set in the config)
Sync instance support for custom tool is undetermined.
Remove logger in clean up snapshots commands.

Edited Aug 21, 2020 by Artyom Kartasov