Fix spec failures due to PG::LockNotAvailable errors
We were seeing a high number of transient failures in the migration jobs
because with_lock_retries
leaked non-zero, short lock_timeout
values
(e.g. 100 ms) when used inside a Rails change
method. If the
PostgreSQL autovacuum process happened to be running, it would lock the
table that it was vacuuming. During the migration rollback, if the DDL
operation needed a lock on the table, the short lock_timeout
would
encounter the existing table lock and fail.
Even though SET LOCAL
was used to ensure lock_timeout
didn't leak
outside of the current transaction, the parent transaction would still
retain that value.
To avoid this issue, we should define separate up
and down
methods
so that we don't rely on the Rails magic to reverse a migration. This
ensures lock retries are used properly in both directions and prevents
lock_timeout
from leaking during a migration rollback.
Closes #207088 (closed)