Retry DDL migrations with low lock_timeout

Extract from the discussion in #34325 (closed).

In order to increase the chance of DDL to succeed, we want to implement optional behavior that allows us to control retries of a migration.

Let's look at an example:

def up
  drop_table :foo
end

Let's assume foo -> bar through a foreign key constraint. This would require an exclusive lock on bar, too, because of the FK. Let's also assume bar is a table with a lot of traffic.

Today's behavior

Worst case, the migration would fail after hitting statement timeout (15s). This is because it might not be able to acquire a lock on bar within this time frame.

During this attempt, any further locks to bar cannot be granted (lock queue). This means, that any reads and writes to this table are going to be blocked for 15s currently. This would in turn stall all requests coming in that interact with bar.

Proposed behavior

def up
  with_lock_retries(timeout: 500.milliseconds) do
    drop_table :foo
  end
end

Here, we set the lock_timeout to a low value (500ms) and attempt the migration step (DROP TABLE foo). If we don't manage to get the lock (ie. a lock timeout exception), we'll retry.

The retry can be implemented with

linear/exponential backoff
a timeout for the migration (10 minutes overall or similar)
Increasing lock_timeout to an upper threshold

We can start with indicating this new behavior as needed (like in the example above) or make the behavior a default for migrations anyways.

This way, we

have a guaranteed maximum time of blocking other reads/writes, substantially lower than statement timeout
increase the likelihood of successful migrations even facing high traffic time.

What we don't get from this is the ability to decouple migrations from this from the success of a deploy. This may be a follow-up issue.