Skip to content

Lock retries in database migrations without subtransactions

We apply a so called "lock-retries" methodology in database migrations:

  1. DDL usually requires (explicit) locks
  2. Acquiring explicit locks stalls incoming traffic in the queue, from the time the lock is requested (not granted) - this can stall the site

The "lock-retries" methodology works like so:

  1. Open (sub-)transaction
  2. Set a lock_timeout to a low value (e.g. 100ms)
  3. Try the migration
  4. If we don't get the lock within the timeout: Wait and retry with a higher lock_timeout

This allows us to control for how long we stall the site while we try to acquire needed locks. Eventually we either give up (the migration fails) or we force-acquire the lock (we stall the site until we have gotten the lock).

Sub-transactions are only needed here if we wrap parts of the migration into lock-retries.

Proposal

Allow transactional migrations to opt-into lock-retries and apply it at the full transaction level (the whole migration).

class SomeMigration < ActiveRecord::Migration[6.0]
  include Gitlab::Database::MigrationHelpers

  with_lock_retries!

  def up
    add_foreign_key(:requirements, :projects, column: :project_id, on_delete: :cascade)
  end

  # ...
end

In this case, we apply the same methodology but at the outer transaction level. If we don't acquire the lock, we retry the whole migration/transaction instead of only parts of it.

  1. This doesn't make a difference for simple transactional migrations that only have one with_lock_retries block anyways
  2. With multiple of those blocks, we can either decide to merge into one (a retry would be more expensive / more locks taken etc.) or split into multiple transactional migrations (preferrable)
  3. Non-transactional migrations can still make use of lock-retries, but also only with a full transaction (no subtransactions)
Edited by Andreas Brandl