Skip to content

Fix spec failures due to PG::LockNotAvailable errors

Stan Hu requested to merge sh-fix-issue-207088-try2 into master

We were seeing a high number of transient failures in the migration jobs because with_lock_retries leaked non-zero, short lock_timeout values (e.g. 100 ms) when used inside a Rails change method. If the PostgreSQL autovacuum process happened to be running, it would lock the table that it was vacuuming. During the migration rollback, if the DDL operation needed a lock on the table, the short lock_timeout would encounter the existing table lock and fail.

Even though SET LOCAL was used to ensure lock_timeout didn't leak outside of the current transaction, the parent transaction would still retain that value.

To avoid this issue, we should define separate up and down methods so that we don't rely on the Rails magic to reverse a migration. This ensures lock retries are used properly in both directions and prevents lock_timeout from leaking during a migration rollback.

Closes #207088 (closed)

Edited by Yorick Peterse

Merge request reports