Investigate, document and prevent or fix postgres database corruption when two all-in-one images run at the same time with same volume
We encountered an issue where postgres wasn't starting up inside of the all in one image with the following error:
PANIC: could not locate a valid checkpoint record
It is believed this error could have occured because two
baserow/baserow containers were running using the same
-v baserow_data:/baserow/data/ volume mounted into both.
Alternatively, it might be possible to trigger this by SIGKILL/ docker killing the container and preventing postgres from shutting down correctly.
- Investigate and try to reproduce the above error
- Add to our documentation instructions on how to prevent, the upgrade guide in the Install On Docker docs shouldn't have the user only
docker stopthe old container before starting a new one, but also
docker rmthe old container AND using
docker psto double check nothing is running.
- Add to our docs how to fix this error by using something like
docker stop baserow && docker run -it --entrypoint /bin/bash --rm baserow/baserow:1.14.0 "su postgres && /usr/lib/postgresql/11/bin/pg_resetwal -f /baserow/data/postgres". It should have the user take a backup of their entire data volume prior to running this command aspg_resetwal` can delete data.
- Investigate if we can detetch and prevent this situation (using say temporary lock files in the data volume etc, if seen then another container must be running so crash or something)