Skip to content

Disable transaction time outs when running database backups

What does this MR do and why?

This MR will disable transaction timeouts by setting PostgreSQL idle_in_transaction_session_timeout to 0, similar to the implementation in pg_dump.

We need this because since the introduction of snapshots in the PostgreSQL backup rake task, we can run into transaction timeouts. This already happened in build pipeline for dev.gitlab.org. A database backup takes about 10-15 minutes but idle_in_transaction_session_timeout is set to 1 minute.

Before we used snapshots in the backup rake task, the code worked like this:

  • For each database, we run pg_dump. This means that we could rely on pg_dump disabling idle_in_transaction_session_timeout

After introduction of snapshots, we have a more complex setup:

  • For each database, take a snapshot
  • Then, again for each database, we run pg_dump with that snapshot.
  • Rollback the snapshots

The last action (rolling back the creation of the snapshot) is causing a failure: we rollback a transaction that hit the 1 minute timeout and was already rolled back. So we need to disable idle_in_transaction_session_timeout which will ensure the snapshots are kept during the pg_dump runs.

Implementation

A new class is added in this MR: Gitlab::Database::WithoutTransactionTimeouts

This class has these methods:

  • disable_timeouts: Set idle_in_transaction_session_timeout
  • restore_timeouts: Restore the original configured value of idle_in_transaction_session_timeout using PostgreSQL 'RESET'

I originally had a more complex implementation that involved remembering the original value in this class but then I learned about RESET so we can rely on PostgreSQL for restoring the original setting. So this class became more trivial than originally anticipated.

I am now not sure if this overengineering: a distinct class for this task could make sense but on the other hand, it is very trivial now. Since I already have the class, let's leave it for now.

How to set up and validate locally

  • Configure postgresql with a idle_in_transaction_session_timeout of 1 (meaning transaction will timeout after 1 ms)
  • On branch master: run the backup: bundle exec rake gitlab:backup:create. It should fail and throw an error
  • Switch to this branch and run the backup again. Now the error will not be thrown

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #389553 (closed)

Edited by Rutger Wessels

Merge request reports