LOCK commands work correctly in a rare case of multiple hash collisions
Final Release Note
LOCK commands work correctly when more than 31 subscripts at a given lock name subscript level (across all processes concurrently using a particular database region) hash to the same value. $$^%PEEKBYNAME("sgmnt_data.lock_hash_bucket_full_cntr",<region>)
now returns the number of times such a rare event was seen (and handled correctly) in . Previously, effective YottaDB r1.20, it was possible for LOCK commands in this rare case to spin and incorrectly seize ownership of the lock resource from a concurrently holding process. [#297 (closed)]
Description
GT.M V6.3-003 (which was incorporated in YottaDB r1.20) introduced an enhancement for LOCK commands as part of GTM-8680 (GT.M LOCK commands perform better with large numbers of locks, and particularly with large numbers of processes acquiring the locks).
GTM-8680 implemented a hash technique to speed up searching for a given subscripted lock name in the lock space table in database shared memory. The hash technique used is Hopscotch hashing (https://en.wikipedia.org/wiki/Hopscotch_hashing) with H set to 32. This meant that if there is a hash collision, then if a subscript hashed to bucket i, it could end up eventually in any bucket from i to i+H-1 (i.e. i+31). But if all those buckets are already occupied, the technique calls for resizing the hash table. But since the hash table is in shared memory (which has a fixed size), no resizing is possible there.
While reviewing the associated code changes, we noticed that the implementation did not account for the resizing case. No errors are issued. Towards, that we came up with a test case that uses more than 32 subscripts all of which hash to the same bucket to see how things work. And we noticed two major issues.
-
If P1 owns many lock resources with the same hash value, it is possible for P2 to seize ownership of some of those lock resources while P1 still has not released any of those locks. The chances of this scenario happening in practice might be small because we need 32 or more subscripts to hash to the same value which is unlikely. Nevertheless, the consequences are serious to the M application since they cannot rely on the LOCK command anymore.
-
Once 32 subscripts hash to the same value, it is possible for LOCK commands to even hang indefinitely in a spin loop.
Draft Release Note
LOCK commands work correctly in rare cases when there are more than 31 subscripts at a given lock name subscript level (across all processes concurrently using a particular database file lock space) that hash to the same value. A $$^%PEEKBYNAME("sgmnt_data.lock_hash_bucket_full_cntr",REGION) now returns the # of times such a rare event was seen (and internally handled correctly) in the region named REGION. Previously, in this situation, it was possible for LOCK commands to spin-loop and incorrectly seize ownership of the lock resource from a concurrently holding process. This issue was introduced in GT.M V6.3-003 as part of GTM-8680 and so exists in YottaDB r1.20 and r1.22.