STACKOFLOW error and Assert failures even after GTM-9333 changes to V7.0-001
Final Release Note
Description
See YDBTest#543 (comment 1680950111) for background on this issue.
GTM-9333, a change in GT.M V7.0-001 which got merged in YottaDB r2.00 had the following release note.
At any point GT.M recognizes only one request for any type of asynchronous processing, including CTRL-C, CTRAP, MUPIP INTRPT, $ZMAXTPTIME, and $ZTIMEOUT. Note that MUPIP INTRPT (SIGUSR1) and untrapped CTRL-C can interrupt other asynchronous events, and an untrapped CTRL-C cancels any other pending asynchronous processing. Previously, GT.M could inappropriately attempt to handle multiple requests for a single type of asynchronous operation, which caused unintended behavior, most likely a stack overflow. The workaround was to avoid rapid interrupting. (GTM-9333)
While trying to come up with a test case for this change in YDBTest#543 (closed), I noticed that a test case that sends SIGUSR1 and SIGINT signals repeatedly ends up with various assert failures in a Debug build and a STACKOFLOW
error in a Release build. The STACKOFLOW symptom was seen in V7.0-000 and V7.0-001 and even the latest V7.1-002.
So clearly the GTM-9333 changes did not do what they document in the release note.
Below is the test case that demonstrates the STACKOFLOW error using a Release build. One can set the gtm_mstack_size
/ ydb_mstack_size
env var to the smallest value possible (25
below) to get the error sooner. Without that set, it just takes a bit longer to see the error.
$ cat test.m
test ;
kill ^pid
set jobstr="job signal^test:(output=""signal_test.mjo"":error=""signal_test.mje"")"
xecute jobstr
set ^child=$zjob
set ^pid=$job
for i=1:1 zwrite i hang 1
quit
signal ;
for quit:$data(^pid) hang 0.001
set quit=0
for i=1:1 quit:quit do
. if $zsigproc(^pid,10)
. for j=1:1:10 quit:quit do
. . set x=$zsigproc(^pid,2)
. . if x zwrite i,j,x set quit=1 quit
. . hang 0.001
quit
$ export gtm_mstack_size=25
$ mumps -run test
.
.
YDB>%YDB-I-CTRLC, CTRL_C encountered
%YDB-F-STACKOFLOW, Stack overflow
And below are the assert failures seen in GT.M and YottaDB releases using the above test case (copied from YDBTest#543 (comment 1678507313)).
List of GT.M assert failures
Assert failed in V70001/sr_port/deferred_events.c line 111 for expression (no_event == outofband || (event_type == outofband))
Assert failed in V71000/sr_port/deferred_events.c line 130 for expression (no_event == outofband || (event_type == outofband))
Assert failed in V71001/sr_port/mdb_condition_handler.c line 793 for expression (jobinterrupt == outofband)
Assert failed in V71002/sr_port/mdb_condition_handler.c line 793 for expression (jobinterrupt == outofband)
List of YottaDB assert failures
Assert failed in V999_R139/sr_port/deferred_events.c line 145 for expression ((xfer_table[xf_linefetch] == op_linefetch) || (xfer_table[xf_linefetch] == op_zstepfetch) || (xfer_table[xf_linefetch] == op_zst_fet_over) || (xfer_table[xf_linefetch] == op_mproflinefetch))
Assert failed in V999_R139/sr_port/deferred_events.c line 272 for expression ((not_in_play == entry->event_state) || (queued == entry->event_state))
Assert failed in V999_R139/sr_port/deferred_events.c line 279 for expression (not_in_play == TAREF1(save_xfer_root, event_type).event_state)
Assert failed in V999_R139/sr_port/deferred_events.c line 392 for expression (pending >= TAREF1(save_xfer_root, outofband).event_state)
Since the user visible impact has not changed between V7.0-000 and V7.0-001, this is a low priority YDB issue for now. I will report this to GT.M and hope that they fix this in the near future (and in turn merge that into YDB at a later point).