Disable transaction time outs when running database backups
What does this MR do and why?
This MR will disable transaction timeouts by setting PostgreSQL idle_in_transaction_session_timeout
to 0, similar to the implementation in pg_dump
.
We need this because since the introduction of snapshots in the PostgreSQL backup rake task, we can run into transaction timeouts. This already happened in build pipeline for dev.gitlab.org. A database backup takes about 10-15 minutes but idle_in_transaction_session_timeout
is set to 1 minute.
Before we used snapshots in the backup rake task, the code worked like this:
- For each database, we run
pg_dump
. This means that we could rely onpg_dump
disablingidle_in_transaction_session_timeout
After introduction of snapshots, we have a more complex setup:
- For each database, take a snapshot
- Then, again for each database, we run
pg_dump
with that snapshot. - Rollback the snapshots
The last action (rolling back the creation of the snapshot) is causing a failure: we rollback a transaction that hit the 1 minute timeout and was already rolled back. So we need to disable idle_in_transaction_session_timeout
which will ensure the snapshots are kept during the pg_dump runs.
Implementation
A new class is added in this MR: Gitlab::Database::WithoutTransactionTimeouts
This class has these methods:
-
disable_timeouts
: Setidle_in_transaction_session_timeout
-
restore_timeouts
: Restore the original configured value ofidle_in_transaction_session_timeout
using PostgreSQL 'RESET'
I originally had a more complex implementation that involved remembering the original value in this class but then I learned about RESET
so we can rely on PostgreSQL for restoring the original setting. So this class became more trivial than originally anticipated.
I am now not sure if this overengineering: a distinct class for this task could make sense but on the other hand, it is very trivial now. Since I already have the class, let's leave it for now.
How to set up and validate locally
- Configure postgresql with a
idle_in_transaction_session_timeout
of 1 (meaning transaction will timeout after 1 ms) - On branch
master
: run the backup:bundle exec rake gitlab:backup:create
. It should fail and throw an error - Switch to this branch and run the backup again. Now the error will not be thrown
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #389553 (closed)