[#869] Fix SIG-11 (instead of NUMOFLOW error) in boolean expressions when huge numerics (e.g. 1E47) are used
Background
-
Below is pasted from #869 (closed)
-
This is an issue that was found by fuzz testing. Various expressions end up in SIG-11 in Release builds and Assert failures in Debug builds. This happens only in YottaDB (and from r1.24 release onwards). Not in GT.M.
-
In GT.M though, I expect to see a
NUMOFLOW
error but see the expression evaluated fine which makes me think there is a correctness issue there. Not sure yet. -
Below are the failure details.
-
Original test case that failed during fuzz testing
Debug build
YDB>write $select(mmfncd'='"77777777777777777777777777777777777777777777777:("rite>"_mmfncd),1:"no got it") %YDB-E-COLON, Colon (:) expected in this context write $select(mmfncd'='"77777777777777777777777777777777777777777777777:("rite>"_mmfncd),1:"no got it") ^----- YDB>write $select(mmfncd'='"777777777777777777777777777777777777777777777777:("rite>"_mmfncd),1:"no got it") %YDB-F-ASSERT, Assert failed in sr_port/ex_tail.c line 82 for expression ((OC_COMVAL == t2->opcode) || (OC_COMINT == t2->opcode))
Release build
YDB>write $select(mmfncd'='"77777777777777777777777777777777777777777777777:("rite>"_mmfncd),1:"no got it") %YDB-E-COLON, Colon (:) expected in this context write $select(mmfncd'='"77777777777777777777777777777777777777777777777:("rite>"_mmfncd),1:"no got it") ^----- YDB>write $select(mmfncd'='"777777777777777777777777777777777777777777777777:("rite>"_mmfncd),1:"no got it") %YDB-F-KILLBYSIGSINFO1, YottaDB process 6822 has been killed by a signal 11 at address 0x00007F0C4E7502F5 (vaddr 0x0000000000000060) %YDB-F-SIGMAPERR, Signal was caused by an address not mapped to an object
-
Simpler test case that shows the same issue
Debug build
YDB>write x='"1E47" %YDB-F-ASSERT, Assert failed in sr_port/ex_tail.c line 82 for expression ((OC_COMVAL == t2->opcode) || (OC_COMINT == t2->opcode))
Release build
YDB>write x='"1E47" %YDB-F-KILLBYSIGSINFO1, YottaDB process 82048 has been killed by a signal 11 at address 0x00007FF7D3FA54AC (vaddr 0x0000000000000060) %YDB-F-SIGMAPERR, Signal was caused by an address not mapped to an object
-
Another test case with different symptom. This fails only in Debug builds. Runs fine in Release builds.
Debug build
YDB>write 0='"1E47" %YDB-F-ASSERT, Assert failed in sr_port/bx_boollit.c line 206 for expression ((OC_NEG != ref0->opcode) && (OC_FORCENUM != ref0->opcode) && (OC_COM != ref0->opcode))
-
A few more test cases that failed.
Debug build
YDB>set x=(1!$s($$^truZQUIT='"1E47":0,1:1)) %YDB-F-ASSERT, Assert failed in sr_port/ex_tail.c line 160 for expression ((OC_COMVAL == t2->opcode) || (OC_COMINT == t2->opcode)) YDB>set (z,$iv)=(1!$s($$^truZQUIT='"1E47":"ok",1:1)) %YDB-F-ASSERT, Assert failed in sr_port/ex_tail.c line 82 for expression ((OC_COMVAL == t2->opcode) || (OC_COMINT == t2->opcode))
-
More tests that fail
Debug build
YDB>i i>1,$zbitfind(b2,'+"1E47"=i w !,"oops" %YDB-F-ASSERT, Assert failed in sr_port/ex_tail.c line 82 for expression ((OC_COMVAL == t2->opcode) || (OC_COMINT == t2->opcode)) YDB>w '+"1E47"=1 %YDB-F-ASSERT, Assert failed in sr_port/bx_boollit.c line 206 for expression ((OC_NEG != ref0->opcode) && (OC_FORCENUM != ref0->opcode) && (OC_COM != ref0->opcode))
-
Test cases (1), (2), (3) and (4) started failing from YottaDB r1.24 only. That is, they worked fine in YottaDB r1.22.
-
I suspect the cause of the regression is the below commit.
- 26173873; 2018-10-03 15:42:47 -0400; Narayanan Iyer; [#363 (closed)] Compiler returns incorrect results in cases where it should issue NUMOFLOW
-
Test case (5) though fails even in YottaDB r1.22. But only in Debug builds. Below is the output in that release.
Debug build
YDB>i i>1,$zbitfind(b2,'+"1E47"=i w !,"oops" %YDB-F-ASSERT, Assert failed in sr_port/coerce.c line 47 for expression (MV_NM & ref->operand[0].oprval.tref->operand[0].oprval.mlit->v.mvtype)
Release build
YDB>i i>1,$zbitfind(b2,'+"1E47"=i w !,"oops" %YDB-E-RPARENMISSING, Right parenthesis expected i i>1,$zbitfind(b2,'+"1E47"=i w !,"oops" ^-----
Issue
-
In case of an expression like
write x='"1E47"
, there would be anOC_LIT
triple corresponding to the literal"1E47"
. There would then be anOC_COM
triple to implement the complement operator ('
) on theOC_LIT
triple. And there would then be aOC_COMVAL
triple to convert the boolean result into an mval for later use in theOC_EQU
triple (that implements the=
operator). -
The
"1E47"
literal cannot be computed at compile time as it has aNUMOFLOW
error which will be issued at runtime. Therefore, literal optimization should not continue if it encounters literals that would end up with aNUMOFLOW
error. But this was not the case in the code. -
So the literal optimization code in
sr_port/bx_boollit.c
incorrectly assumed the'"1E47"
computation can be done at compile time and removed the unnecessaryOC_COMVAL
triple (using adqdel()
) which later confused the logic inex_tail.c
and failed an assert where it was expecting aOC_COMVAL
orOC_COMINT
triple.
Fix
-
In
sr_port/bx_boollit.c
, we now check if a literal is a number that evaluates to aNUMOFLOW
error and if the unary operation that is being performed on it is aOC_COM
(unary'
operator),OC_NEG
(unary-
operator) orOC_FORCENUM
(unary+
operator). If so, we return after issuing aNUMOFLOW
error as literal optimization cannot be performed. This avoids thedqdel()
(which happens later in the same function) from happening thereby avoiding the assert failure. -
Because of the newly added check (in the previous bullet), a pre-existing similar check a few lines later (right after a
s2n()
call) is now replaced with an assert since we would never reach here in case of a NUMOFLOW error (we would have returned from the newly added check a few lines above). -
An unrelated change to
sr_port/bx_boollit.c
was to remove the lineneg = num = 0;
as I did not see it do anything (neg
andnum
are set to a value later anyways). This fixed 3 warnings from clang-tidy and so the following reference files were updated to remove those 3 warning lines.- ci/tidy_warnings_release.ref
- ci/tidy_warnings_debug.ref