Transactions generating >1GiB logical update journal records abort with TRANSREPLJNL1GB error

Final Release Note

Transactions that generate more than 1GiB of logical update journal records, abort with a TRANSREPLJNL1GB error. Previously such transactions would terminate the process with a %YDB-F-MEMORY, %YDB-F-GTMASSERT2, or raise errors such as %YDB-E-JNLCTRL or %YDB-E-REPLJNLCLOSED. That last error would turn off replication in a replicated instance, requiring a new replicating secondary instance to be created from a backup of another instance. Since an ACID transaction is normally a conceptually integral, atomic unit of work, a transaction that generates more than 1GiB of logical update records may be buggy or inappropriately designed. The most plausible application scenario where a transaction is genuinely likely to generate such large volumes of journal update records is one that sets multiple nodes with large values. For such applications, consider compressing the data stored in the global variable nodes. M applications can use using the zlib plugin to compress data; other languages can call compression libraries directly. [#749 (closed)]

Description

Below are various issues that were noticed while trying to do HUGE transactions (i.e. more than a million updates to a journaled database file inside one TSTART/TCOMMIT transaction).

Test 1

$ cat > x.m << CAT_EOF
        tstart ():serial
        for i=1:1:2100000 set ^x=$j(i,1000)
        tcommit
CAT_EOF
$ yottadb -run x
$

The above yottadb command returns back to the shell prompt fine but I see the source server fail with a GTMASSERT2 fatal error. The error shows up in the syslog as well as in the source server log. I am pasting the source server log below.

Fri Jun 25 15:12:51 2021 : Source server now reading from journal files; journal pool overflow detected at seqno 1 [0x1]
%YDB-F-GTMASSERT2, YottaDB r999 Linux x86_64 - Assert failed sr_unix/gtmsource_process_ops.c line 973 for expression (0 < *data_len)
Fri Jun 25 15:12:56 2021 : Source server exiting...

Test 2

$ cat > x.m << CAT_EOF
        tstart ():serial
        for i=1:1:4200000 set ^x=$j(i,1000)
        tcommit
CAT_EOF
$ yottadb -run x
%YDB-F-MEMORY, Central memory exhausted during request for 18446744056529682448 bytes from 0x00007F0D6604E8A0
%SYSTEM-E-ENO12, Cannot allocate memory

The below is an error that shows up in the yottadb process itself.

Test 3

$ cat > x.m << CAT_EOF
        for iters=1:1:3 do
        . tstart ():serial
        . for i=1:1:1500000 set ^x=$j(i,1000)
        . tcommit
CAT_EOF
$ yottadb -run x

The above yottadb command runs fine and returns to the shell, but I see the following JNLCNTRL message in the syslog. In case of a replicated environment, one also sees a REPLJNLCLOSED message which is very unfriendly as it closes replication on that database file.

Jun 25 15:10:58 tyntmeadow YDB-YOTTADB-INSTA[64438]: %YDB-I-JNLSENDOPER, pid = 0x0000FBB6 : status = 0x08F691A2 : jpc_status = 0x00000000 : jpc_status2 = 0x00000000 : iosb.cond = 0x0000, %YDB-E-JNLCNTRL, Journal control unsynchronized for yottadb.mjl.
Jun 25 15:10:58 tyntmeadow YDB-YOTTADB-INSTA[64438]: %YDB-I-JNLBUFINFO, Pid 0x0000FBB6 dsk 0x0006AA20 free 0x00017750 bytcnt 0x1D8875A8 io_in_prog 0x00000000 fsync_in_prog 0x00000000 dskaddr 0xFFF6AA20 freeaddr 0x1D897750 qiocnt 0x00001C92 now_writer 0x00000000 fsync_pid 0x00000000 filesize 0x00609800 cycle 0x00000002 errcnt 0x00000000 wrtsize 0x0006AA20 fsync_dskaddr 0xBE5BF548 rsrv_free 0x00017750 rsrv_freeaddr 0x1D897750 phase2_commit_index1 0x00000004 phase2_commit_index2 0x00000004 next_align_addr 0x1D9FFFF0 size 0x00000000
Jun 25 15:10:58 tyntmeadow YDB-YOTTADB-INSTA[64438]: %YDB-I-JNLPVTINFO, Pid 0x0000FBB6 cycle 0x00000002 fd_mismatch 0x00000000 channel 0x00000004 sync_io 0x00000000 pini_addr 0x000101A8 qio_active 0x00000000 old_channel 0x00000000
Jun 25 15:10:58 tyntmeadow YDB-YOTTADB-INSTA[64438]: %YDB-E-REPLJNLCLOSED, Replication in jeopardy as journaling got closed for database file yottadb.dat. Current region seqno is 4 [0x0000000000000004] and system seqno is 4 [0x0000000000000004] -- generated from 0x0000000000000000.
Jun 25 15:10:58 tyntmeadow YDB-YOTTADB-INSTA[64438]: %YDB-I-WCSFLUFAILED, File sr_unix/wcs_flu.c, Line 472 error while flushing buffers at transaction number 0x0000000000000004 for database file yottadb.dat, %YDB-E-JNLCNTRL, Journal control unsynchronized for . -- generated from 0x0000000000000000.
Jun 25 15:10:58 tyntmeadow YDB-YOTTADB-INSTA[64438]: %YDB-I-JNLFLUSH, Error flushing journal buffers to journal file yottadb.mjl -- generated from 0x0000000000000000.

All the above errors are incorrect and user-unfriendly.

As long as all the updates in the transaction will fit in the journal file, YottaDB should either fit those in the current journal file or automatically switch to a new journal file and fit those updates there. If they won't fit even in the maximum size possible in the journal file (autoswitchlimit value with a hard stop at 4Gib), it should issue a more friendly JNLTRANS2BIG error that prevents the transaction commit from even starting.

Draft Release Note

Transactions that perform a lot of updates (for example doing a million SET commands inside a TSTART/TCOMMIT fence in an M program) either finish as expected or issue a TRANSREPLJNL1GB error and abort the transaction. Previously, it was possible for such transactions to fail with various types of incorrect (and user-unfriendly) errors including %YDB-F-MEMORY, %YDB-F-GTMASSERT2, %YDB-E-JNLCNTRL and %YDB-E-REPLJNLCLOSED. The last error would even turn off replication which could require shipping a new backup of the source side database files to the receiver side in a replicated environment. [#749 (closed)]

Edited Jul 01, 2021 by K.S. Bhaskar