Signal handling changes to better support Go wrapper
Final Release Note
Description
This issue is to mitigate some experiences a user is having using the Go wrapper dealing with Signals. Primarily, their system uses Kubernetes Pods from which the standard shutdown is a SIGTERM signal that must be dealt with "pleasantly" by the Go wrapper - currently it panics.
The designed changes were broken into 3 subtasks:
- Add the ability for a user to register a handler for a signal that gets driven either before the YDB handler or after the YDB handler. Note such routines registered to be handled AFTER YDB stand a high probability of never being driven for fatal signals because YDB's normal path is to shut the process down. This allows certain cleanup functions to be driven at the time of the signal. This first step is now implemented as YottaDB/Lang/YDBGo#34 (closed) as it was entirely wrapper changes. But there are 2 more tasks to completely address the issues the user saw.
- The second task is about dealing better with asynchronous signals (as defined by Go).These are signals that are SENT to the process by another process. This does NOT cover the synchronous signals (SIGSEGV, SIGBUS, SIGFPE) though I guess it does cover SIGILL since I don't see it mentioned as a potential synchronous signal. We'll talk about synchronous signals in the next step:
- I have some changes for YottaDB and the wrapper together to change how signals are handled.
- I will add a couple of new error codes but the one the user will see mostly is YDB_FATAL_ERROR_BUBBLE (name subject to change). The purpose of this code is so that when this error is returned from a call, the user knows to immediately return to the caller with that same return code. In this fashion, we bubble out through the call chain unwinding things as we go. The intent is to bubble all the way out to the outer TP level and let the wrapper then throw a FATALERROR panic in the correct context so it can be caught if needed.
- Inside YottaDB, I'd also like to separate the signals into the same synchronous and asynchronous signals that Go does. Any signal caught via Go's notify is by definition not a fatal error (as far as the system is concerned) but we'll terminate cleanly but allow database activity prior to shutdown (i.e. it won't end YDB access automatically).
- Additionally, I want to put a defer/recover in the code that drives each TP routine so we can catch panics and figure out a way to rethrow them once the call-back function returns to the wrapper.
- The main purpose of these changes is to allow routines to terminate cleanly and without ugly panic/error messages at least for asynchronous fatal signals.
- This phase will get an issue # when it is started.
- The third and final phase is to deal better with synchronous signals (SIGSEGV, SIGBUS, and SIGFPE as defined by Go).
- We've taken the first steps to dealing with synchronous interrupts in Step Two by having all wrappers setup a defer/recover handler. This wrapper handler should (mostly) be able to catch these signals but if they are caught in a context without a defer/recover handler, they will cause the process to terminate. This is up to the user. Each goroutine that is launched needs to have a defer/recover handler as they are in a different context. When we see/catch a synchronous signal, we will drive the appropriate YDB signal handler before or after driving any defined user handler.
- I'll have more on this at a future point but if a user's defer/recover catches a synchronous signal panic (Go automatically translates them to panics), it should return with the errmsg parm notifying of this so it can bubble out and be rethrown in the origin thread's context.
- One of the drawbacks here is a problem we've seen before. A synchronous signal is sent to the thread that caused the event so it is happening in the context of whatever was running at the time (C or Go). If the signal happens inside YDB, then most typically, the failing thread will also hold the engine lock. MOST of the time, this will work in our favor but there may be times that it does not happen this way - especially if the 'defer ottadb.Exit()' statement is missing from the main.
- The purpose of these changes is to allow applications to terminate cleanly for synchronous fatal signals.
- This last phase can use this issue# when implemented.
Draft Release Note
Edited by Narayanan Iyer