Fix gtm_tls_impl.c regression (in a7d6551e) related to ECONNRESET handling
Background
-
As part of a prior commit (a7d6551e), a change was done to the
ssl_error()
function in theSSL_ERROR_SYSCALL
case to not settls_errno
toECONNRESET
in caseerrno
was 0. This is because OpenSSL 3.0 ensured that would never be the case. -
But as part of that change, a set of
tls_errno = errno;
was also moved into code that executed only for pre-OpenSSL-3.0. -
This turned out to be incorrect. Because it is still possible for OpenSSL 3.0 to return with an error code of
SSL_ERROR_SYSCALL
. All that we are guaranteed is that theerrno
would not be0
in that case. In case a connection got reset and a system call failed because of that, one would see anerrno
ofECONNRESET
in OpenSSL 3.0 whereas one would see anerrno
of0
in pre-OpenSSL-3.0. -
This scenario was observed in various replication tests in the YDBTest suite in case they ran with TLS randomly enabled and shut the receiver server down while keeping the source server still running. In that case, the source server would notice an error during the
send()
system call and get tossl_error()
with an error code ofSSL_ERROR_SYSCALL
anderrno
set toECONNRESET
but since we incorrectly did nothing in that case for OpenSSL 3.0,tls_errno
did not get set toECONNRESET
in that case and that caused the replication source server logic (which invokes this reference implementation of the encryption plugin ingtm_tls_impl.c
) to think no error occurred and therefore retried the send indefinitely resulting in an ever-increasing source server log file that had messages of the following form (and was in a spin-loop as well using up a full CPU).Sat Apr 9 10:01:52 2022 : Returning err: 0
Fix
- The set of
tls_errno = errno;
is now done for OpenSSL versions older than 3.0 as well as 3.0 and greater. It is only the check for0 == tls_errno
(and the accompanying reset oftls_errno
toECONNRESET
) that is now done for OpenSSL versions older than 3.0.