Skip to content

Add option to order locks on add_concurrent_foreign_key

Patrick Bair requested to merge pb-order-locks-when-adding-foreign-keys into master

What does this MR do?

During deployment of !65795 (merged), the migration failed due to deadlocks with another process. Upon investigation, the conflicting process creates an event, and then creates push_event_payloads for that event. Unfortunately, the migration first failed when adding the foreign key, and then again when trying to obtain the explicit locks on the two tables.

A possible solution to this problem is to reverse the order in which the locks are required, to better match application locking order. For the explicit lock statement, this can be easily done. This MR handles the other piece of that, by modifying the add_concurrent_foreign_key helper to first explicit lock the tables in a certain order before creating the foreign key.

How to setup and validate locally

Example of how this works:

PG session 1

BEGIN;

LOCK TABLE events in SHARE ROW EXCLUSIVE MODE;

PG session 2

BEGIN;

ALTER TABLE push_event_payloads
ADD CONSTRAINT testfk
FOREIGN (event_id_convert_to_bigint) REFERENCES events (id) NOT VALID;

At this point the ALTER TABLE will hang, waiting for the events lock. In a third session you can verify the locks:

SELECT pg_class.relname, pg_locks.pid, pg_locks.mode
FROM pg_locks
LEFT JOIN pg_class
ON pg_class.oid = pg_locks.relation
WHERE locktype = 'relation'
  AND relname IN ('events', 'push_event_payloads');

       relname       |  pid  |         mode
---------------------+-------+-----------------------
 events              | 94100 | ShareRowExclusiveLock
 push_event_payloads | 13124 | ShareRowExclusiveLock
 events              | 13124 | ShareRowExclusiveLock

Now, if we go back to session 1:

PG session 1

LOCK TABLE push_event_payloads in SHARE ROW EXCLUSIVE MODE;

ERROR:  deadlock detected
DETAIL:  Process 94100 waits for ShareRowExclusiveLock on relation 12549384 of database 12544519; blocked by process 13124.
Process 13124 waits for ShareRowExclusiveLock on relation 12547458 of database 12544519; blocked by process 94100.
HINT:  See server log for query details.
Time: 1001.650 ms (00:01.002)

We get a deadlock. But if we repeat the same experiment, but in session 2 before we create the foreign key, we:

LOCK TABLE events, push_event_payloads IN SHARE ROW EXCLUSIVE MODE;

That transaction will hang until the transaction session completes. We can still get a lock timeout, but we use with_lock_retries which will retry that operation again.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Patrick Bair

Merge request reports