$INCREMENT(gvn) works correctly within a transaction with NOISOLATION enabled for gvn
Final Release Note
$INCREMENT(gvn) works correctly within a transaction even if gvn has the NOISOLATION feature turned on. Previously, if multiple processes attempted the $INCREMENT() at the same time, it was possible for the post increment value to be incorrectly set to ""
(i.e. the empty string), as the result of a regression introduced by the code for [#353 (closed)] in r1.24. This was only encountered in a development environment and not reported by a user. [#711 (closed)]
Description
This is an issue that was noticed in internal testing. The r124/ydb353
automated subtest (written as part of fixing #353 (closed)) in the YDBTest
repo failed with the following diff in rare cases (as rare as 1 in 10,000 test runs).
--- ydb353/ydb353.diff ---
14c14
< ##DIFFERENT EXPRESSION##^\^x=[0-9]*$## VS ##^x=""##
---
> ^x=""
The gvn ^x
is updated in the test inside of a TSTART
/TCOMMIT
fence and using $INCREMENT
. The test also turns on NOISOLATION
for ^x
. And then does 25,000 $INCREMENT(^x)
calls across 5 different concurrently running processes and expects the final value of ^x to be a decimal number greater than 0 and less than 25,000. Hence the ^x=[0-9]*
expression in the reference file. But we end up seeing a value of "" in the actual test run.
The post increment value should never end up being the empty string. So analyzed this failure some more.
It turns out to be a very subtle issue with the #353 (closed) changes that got released in r1.24. As part of the #353 (closed) changes, NOISOLATION
was allowed with $INCREMENT
. But pre-existing code in gvcst_put.c
was coded in such a way that it would work correctly all the time as long as the assumption that NOISOLATION
and $INCREMENT
are disallowed was valid. And would work correctly almost all the time
even if the assumption was not valid. But it would not work correctly in very rare cases if the assumption was not valid. And it is exactly that rare case which the ydb353
subtest exercised.
The fix is to use the right variable value
instead of the variable val->str
in gvcst_put.c
in the code block where we set up the recompute array for updates to NOISOLATION globals. Both the variables have the same value in most cases except when the caller is $INCREMENT
. In that case, val->str
is the empty string whereas value
points to the proper numeric post increment value. And so using val->str
results in the post increment value being incorrectly set to ""
. Using value
would set it to the correct value.
While I did find the fix after a review of the code, what was puzzling for me was why the test was only failing so rarely. I was expecting it to fail more often. Below follows an account of why that is the case.
- While the recompute array is set up in
gvcst_put.c
for every TP update to aNOISOLATION
global, it is only used if/when the need for a recomputation is seen at commit time intp_tend()
. And that happens only if a restartable situation is noticed due to concurrent updates. And so it is normally rare to encounter this scenario. - But the
ydb353
subtest runs 5 processes concurrently and so this scenario would be encountered almost always. The reason why this failure is really rare is that even if^x
ends up with the incorrect value""
in many of the thousands of updates done by the test, it would be fixed by a next$INCREMENT
done by the test assuming that does not encounter a restart/need-for-recomputation. And since the test does no more validation other than checking that the final value is some decimal integer, it would pass almost all the time. The only exception is if thelast
update (of the 25,000 updates) done by the test ends up in this restartable state and needs to use the recompute array. - Now that I know the issue, I realize the existing ydb353 subtest would have caught this failure reliably with some simple test changes. I did make those few-line changes and the test failed right away.
In a nutshell, the issue is an edge case bug in global variables that have NOISOLATION turned on and use $INCREMENT inside a TP transaction. This is a use case that is disallowed in the upstream/GT.M code and hence it is not an upstream issue.
Draft Release Note
$INCREMENT(gvn)
works correctly inside a TSTART
/TCOMMIT
fence even if gvn
has the NOISOLATION feature turned on. Previously, if multiple processes attempt $INCREMENT
at the same time, it was possible for the post increment value to be incorrectly set to ""
(i.e. the empty string). This is a regression introduced by #353 (closed) in YottaDB r1.24.