Certain cases of invalid collation specification in directory tree records handled correctly
Final Release Note
When a directory tree record has an invalid collation specification (a structural error), MUPIP INTEG issues an INVSPECREC error with appropriate context (block number and offset where the error was noticed), while DSE and application programs assume no collation and proceed. Previously, in some cases of an INVSPECREC error, it was possible for all the above commands to loop for ever. [#346 (closed)]
Description
The directory tree has a record for each global name. That record could contain collation information of each global name in case that global name has collation characteristics. The collation information is currently 4-bytes and follows the 4-byte root block number in the record. The first byte of these 4-byte is usually 1. Any value other than 1 is an integrity error since 1 is the only currently supported value. If this is greater than 2, an INVSPECREC error is currently issued. But if this is exactly equal to 2, no INVSPECREC error is issued. And it is possible (depending on the build memory layout) that this causes an infinite loop in various commands as illustrated below.
> cat x.csh
set verbose
setenv ydb_gbldir mumps.gld
rm mumps.gld; $ydb_dist/mumps -run GDE exit
rm mumps.dat; $ydb_dist/mupip create
$ydb_dist/mumps -run ^%XCMD 'set ^x=0 kill ^x set x=$$set^%GBLDEF("^x",0,1)'
$ydb_dist/dse dump -block=2
$ydb_dist/dse overwrite -block=2 -offset=1b -data=\\2
$ydb_dist/mumps -run ^%XCMD 'set x=$get(^x)'
$ydb_dist/dse dump -block=2
$ydb_dist/mupip integ mumps.dat
In the above script, after the DSE OVERWRITE command, any one of the 3 commands that follow spin loops (i.e. never returns). This is a longstanding bug. Not seen in practice usually since that requires a broken database. But nevertheless, the tools that will be used when a database is found to be broken (i.e. MUPIP INTEG and DSE) need to not infinite loop.
The fixes needed are
- DSE, MUPIP INTEG and MUMPS needs to not infinite loop in case the first byte of the 4-byte collation record contains 0x2.
- MUPIP INTEG should issue a user-friendly error indicating the block, record and offset where the error is detected in this case.
- DSE DUMP should not hang or error out while trying to fetch the corrupted 4-byte collation record. This is the only way we can find out what bytes exist so we can then come up with DSE OVERWRITE commands or so to fix the broken record.
Draft Release Note
MUPIP INTEG issues an INVSPECREC error with appropriate context (block number and offset where the error was noticed) while MUMPS and DSE DUMP no longer issue an INVSPECREC error (but instead assume no collation and proceed) in case directory tree records containing global collation information is corrupt. Previously, in some cases of an INVSPECREC error, it was possible for all the above commands to spin loop (i.e. hang indefinitely).