Skip to content

CTDB tickle/connection tracking improvements and features

This is quite a big series but I would be happy to accept RB+s for logical subsets. For example, the test fixes at the front, the ctdb-server clean ups (and debug level change), or the script/documentation clean ups. That will make the remainder smaller for others to review. Or, please feel free to review it all. 😃

All of this is related to TCP connection tracking/tickles but there are distinct parts:

  • 3 fairly trivial fixes to the ss stub used in the event script unit tests.
  • 6 script clean ups, reformatting and documentation improvements - all quite generic.
  • 6 clean ups to the server code for tracking connection. I went there to change the debug level of a single log message, but stayed there a while to make things better. Many of the log messages contained a subset of useful information (i.e. only one end of the connection being added, probably because ctdb_addr_to_str() uses a static buffer, so can't be called twice in the same message) so I followed the lead of some of the other code and pre-render the connection string to a buffer. For some reason, a couple of these debugs logged the PNN, which would always be the current node, so isn't useful information. Anyway, hopefully this sequence commits makes the logs more useful and the code more comprehensible to new readers.
  • 3 commits to move the monitor event based tracking of TCP connections from being NFS-only (i.e. port 2049) to 10.interface.script and handle all ports for currently hosted public IPs. They can't sanely be merged without at least 1 of the ctdb-server changes because I really want to reduce that debug level for 1 message. 😉
  • 2 commits to add support for using ss -K to terminate the server end of connections in releaseip via new script option CTDB_KILLTCP_USE_SS_KILL. The existing ctdb_killtcp has seen some reliability problems and using ss -K seems like a better idea, because it is supported and fast, even though it changes behaviour by doing a 2-way kill. CTDB_KILLTCP_USE_SS_KILL defaults to no, so the default behaviour is unchanged. After this has seen some real world testing, I'd like to at least change the default value to try - we can't do more than this until the Linux kernel CONFIG_INET_DIAG_DESTROY option is universally enabled in distro kernels (this is much closer than it was a couple of years ago).

Checklist

  • Commits have Signed-off-by: with name/author being identical to the commit author
  • (optional) This MR is just one part towards a larger feature.
  • (optional, if backport required) Bugzilla bug filed and BUG: tag added
  • Test suite updated with functionality tests
  • Test suite updated with negative tests
  • Documentation updated
  • CI timeout is 3h or higher (see Settings/CICD/General pipelines/ Timeout)

Reviewer's checklist:

  • There is a test suite reasonably covering new functionality or modifications
  • Function naming, parameters, return values, types, etc., are consistent and according to README.Coding.md
  • This feature/change has adequate documentation added
  • No obvious mistakes in the code
Edited by Martin Schwenke

Merge request reports

Loading