Investigate, document and prevent or fix postgres database corruption when two all-in-one images run at the same time with same volume
We encountered an issue where postgres wasn't starting up inside of the all in one image with the following error:
PANIC: could not locate a valid checkpoint record
It is believed this error could have occured because two baserow/baserow
containers were running using the same -v baserow_data:/baserow/data/
volume mounted into both.
Alternatively, it might be possible to trigger this by SIGKILL/ docker killing the container and preventing postgres from shutting down correctly.
Issue Actions
- Investigate and try to reproduce the above error
- Add to our documentation instructions on how to prevent, the upgrade guide in the Install On Docker docs shouldn't have the user only
docker stop
the old container before starting a new one, but alsodocker rm
the old container AND usingdocker ps
to double check nothing is running. - Add to our docs how to fix this error by using something like
docker stop baserow && docker run -it --entrypoint /bin/bash --rm baserow/baserow:1.14.0 "su postgres && /usr/lib/postgresql/11/bin/pg_resetwal -f /baserow/data/postgres". It should have the user take a backup of their entire data volume prior to running this command as
pg_resetwal` can delete data. - Investigate if we can detetch and prevent this situation (using say temporary lock files in the data volume etc, if seen then another container must be running so crash or something)