Skip to content

Limit the max age of a TLS keepalive connection

Stan Hu requested to merge sh-limit-tls-connection-age into main

Why was this MR needed?

Previously the Runner keeps the default DisableKeepAlive setting to false, which ensures that API requests to POST /api/v4/jobs/request get reused on the same connection before and after jobs run. This connection appears to live indefinitely, but this long-lived connection can cause a number of problems:

  1. When TLS certificates were rotated on GitLab.com, existing connections continued to use the old ones to populate CI_SERVER_TLS_CA_FILE for Git clones. Limiting the connection to 15 minutes will force the Runner to reconnect and pick up the latest certificates.

  2. As https://github.com/golang/go/issues/54429 describes, Web services may scale up over time and distribute the load. Long-lived connections can prevent connections from being evenly distributed.

This commit also adds a connection_max_age setting. If the value is not specified, the default 15 minutes is used.

When the max age is reached, this commit calls CloseIdleConnections(). This will force a reconnection if all network calls are idle. Once https://github.com/golang/go/issues/54429 is implemented, we could avoid the need to manage this timer.

What's the best way to test this MR?

With main branch

  1. In your config.toml register a runner with gitlab.com.
  2. Run tcpdump -i <interface> -w /tmp/gitlab1.pcap host gitlab.com.
  3. Run the runner: ./out/binaries/gitlab-runner run --config config.toml
  4. Wait a 5-10 minutes and hit CTRL-C for both tcpdump and gitlab-runner.
  5. Open wireshark /tmp/gitlab1.pcap. On the first Client Info message, click that message, right-click on Follow -> TCP stream.
  6. Sort by Info.

You should see many Client Hello messages.

With this branch

  1. Check out this branch and compile (make runner-bin-host)
  2. In config.toml add connection_max_age = "1s".
  3. Run tcpdump -i <interface> -w /tmp/gitlab2.pcap host gitlab.com.
  4. Re-run the runner: ./out/binaries/gitlab-runner run --config config.toml
  5. Repeats step 4-6 and see that there should only be one Client Hello messages for the first TCP stream.

What are the relevant issue numbers?

Relates to #37275 (closed)

Edited by Stan Hu

Merge request reports