Removed a rare case of SIGSEGV (SIG-11) for MM databases with database file extensions and concurrent process termination with SIGKILL (SIG-9)
Final Release Note
Updates to database files that use the MM access method work correctly when there are concurrent database file extensions and termination of processes accessing the database using SIGKILL (SIG-9). Previously, processes updating database files under these conditions could in rare cases terminate with a SIGSEGV (SIG-11). Note that YottaDB strongly recommends against the use of SIGKILL to terminate processes, except as a last resort. This was only encountered in a development environment and not reported by a user. [#651 (closed)]
Description
This is an issue that was noticed in an internal test (recov_0/D9E04002440_nobefore
) where a process terminated abnormally with a SIG-11. Below is the failure diff.
> recov_0/D9E04002440_nobefore/impjob_imptp0.mje5
> %YDB-F-KILLBYSIGSINFO1, YottaDB process 64266 has been killed by a signal 11 at address 0x00007FA163AD1C94 (vaddr 0x00007FA147B62408)
The SIG-11 happened at line 872 below when trying to dereference t1->buffaddr
. It was pointing to an invalid address.
t_end.c
--------
870 if (is_mm)
871 {
872 if (t1->tn <= ((blk_hdr_ptr_t)(t1->buffaddr))->tn)
873 {
After some analysis, the suspicion is that the MM database file got extended concurrently and this process updated its csa->db_addrs[0]
and csa->db_addrs[1]
to point to the newly mapped database file range but t1->buffaddr
was still pointing to the older mapped database file (which was unmapped in gds_map_moved()
).
Draft Release Note
Updates to database files with an access method of MM
work fine even if the database file extensions happen and processes are abnormally terminated (e.g. kill -9
). Previously it was rarely possible in such cases for processes to terminate abnormally with a SIG-11 when they read/write a MM database just after a process was killed using kill -9
. Note that terminating YottaDB processes using kill -9 is not recommended.