CTDB tickle/connection tracking improvements and features
This is quite a big series but I would be happy to accept RB+s for logical subsets. For example, the test fixes at the front, the ctdb-server
clean ups (and debug level change), or the script/documentation clean ups. That will make the remainder smaller for others to review. Or, please feel free to review it all.
All of this is related to TCP connection tracking/tickles but there are distinct parts:
- 3 fairly trivial fixes to the
ss
stub used in the event script unit tests. - 6 script clean ups, reformatting and documentation improvements - all quite generic.
- 6 clean ups to the server code for tracking connection. I went there to change the debug level of a single log message, but stayed there a while to make things better. Many of the log messages contained a subset of useful information (i.e. only one end of the connection being added, probably because
ctdb_addr_to_str()
uses a static buffer, so can't be called twice in the same message) so I followed the lead of some of the other code and pre-render the connection string to a buffer. For some reason, a couple of these debugs logged the PNN, which would always be the current node, so isn't useful information. Anyway, hopefully this sequence commits makes the logs more useful and the code more comprehensible to new readers. - 3 commits to move the
monitor
event based tracking of TCP connections from being NFS-only (i.e. port 2049) to10.interface.script
and handle all ports for currently hosted public IPs. They can't sanely be merged without at least 1 of the ctdb-server changes because I really want to reduce that debug level for 1 message.😉 - 2 commits to add support for using
ss -K
to terminate the server end of connections inreleaseip
via new script optionCTDB_KILLTCP_USE_SS_KILL
. The existingctdb_killtcp
has seen some reliability problems and usingss -K
seems like a better idea, because it is supported and fast, even though it changes behaviour by doing a 2-way kill.CTDB_KILLTCP_USE_SS_KILL
defaults tono
, so the default behaviour is unchanged. After this has seen some real world testing, I'd like to at least change the default value totry
- we can't do more than this until the Linux kernelCONFIG_INET_DIAG_DESTROY
option is universally enabled in distro kernels (this is much closer than it was a couple of years ago).
Checklist
-
Commits have Signed-off-by:
with name/author being identical to the commit author -
(optional) This MR is just one part towards a larger feature. -
(optional, if backport required) Bugzilla bug filed and BUG:
tag added -
Test suite updated with functionality tests -
Test suite updated with negative tests -
Documentation updated -
CI timeout is 3h or higher (see Settings/CICD/General pipelines/ Timeout)
Reviewer's checklist:
-
There is a test suite reasonably covering new functionality or modifications -
Function naming, parameters, return values, types, etc., are consistent and according to README.Coding.md
-
This feature/change has adequate documentation added -
No obvious mistakes in the code
Edited by Martin Schwenke