Avoid SI replication REPLINSTNOHIST after Receiver Server shutdown before any data is replicated
Final Release Note
A Receiver Server on a Supplementary Instance starts replicating after a prior Receiver Server was shutdown after it connected to a Source Server for the first time but before it actually received any data. Previously, a receiver startup in this case could incorrectly terminate with a REPLINSTNOHIST error. [#361 (closed)]
Description
This was observed in internal testing. The manually_start/4g_journal subtest failed an assert using the debug build of YottaDB in the receiver server.
%YDB-F-ASSERT, Assert failed in sr_unix/gtmrecv_process.c line 1389 for expression (strm_jnl_seqno || !inst_hdr->is_supplementary || remote_side_is_supplementary)
In the failure case, the A->P connection (where A indicates non-supplementary source side and P indicates supplementary receiver side) had connected for the very first time. Because of the first time connection, -updateresync and -initialize were used in the receiver server startup.
In the middle of the initial handshake between the source and receiver servers, the receiver server was shut down.
If the receiver server had not been shut down, the receiver server log would normally have the following 4 lines of messages.
Received REPL_WILL_RESTART_WITH_INFO message with seqno 1 [0x1]
Wrote upd_proc_local->read_jnl_seqno : 1 [0x1]
REPL INFO - Seqno : 1 [0x1] Jnl Total : 168 [0xa8] Msg Total : 368 [0x170] Current backlog : 0 [0x0]
New History Content : Start Seqno = 1 [0x1] : Stream Seqno = 0 [0x0] : Root Primary = [INSTA] : Cycle = [2] ...
But because of the shutdown, the messages stopped after the first 2 lines. The next 2 lines did not show up. That is, a new history record did not get added to the replication instance file on P.
In this state, when the receiver server was restarted (this time without the -updateresync or -initialize because this is not the first startup) it assert failed.
In a production build, the receiver server would have failed with a REPLINSTNOHIST error. This issue is very similar to GTM-8730 which was fixed in GT.M V6.3-002 (release note at http://tinco.pair.com/bhaskar/gtm/doc/articles/GTM_V6.3-002_Release_Notes.html#GTM-8730). It is an edge case that is very unlikely to be encountered in practice since one usually waits for the first ever connection between a primary and secondary to run for a while before deciding to shut down the receiver side.
Draft Release Note
Receiver server startup on a supplementary instance works appropriately after the prior receiver server was shutdown after connecting to a non-supplementary source side for the first time. Previously, a receiver startup in this case could incorrectly terminate with a REPLINSTNOHIST error in case the prior receiver server was shutdown after the initial handshake with the source side but before even one update got sent across. (GTM-8730)