Skip to content

Roll out Cloud NAT to CI shared runners

Production Change - Criticality 2 C2

Change Objective Roll out Cloud NAT to CI shared runners
Change Type Type described above
Services Impacted List services
Change Team Members @hphilipps and @craigf
Change Severity C2
Buddy check A colleague will review the change
Tested in staging Not tested in a non-prod environment, but private runners are already running without private IPs, behind Cloud NAT.
Schedule of the change Probably the week beginning 30th September 2019.
Duration of the change Quick to execute, rolls out slowly over several hours. Same for rollback

Steps

  1. Ensure that a behavioural change in CI is communicated to customers ahead of time: jobs will no longer be able to open an arbitrarily large number of concurrent connections to the same address (ip:port). Initially, this concurrency will be limited to 256 over a 2 minute period but we reserve the right to decrease it further. Concurrent connections to different addresses are unaffected.
  2. Merge https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/1857 and run the manual apply-to-prod step in the master pipeline.
  3. Once the shared runner managers have had chef run (up to 30 mins later) all new docker-machine runners will be provisioned with no public IP and outbound traffic will automatically exit via the Cloud NAT.
  4. Keep an eye on the NAT dashboard every few hours, but this is not particularly necessary as there is no alerting in place: elevated error rates (dropped packets) may occur if jobs make many concurrent connections to the same address, but this is by design.

Rollback

  1. Rollback: https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/1857, apply-to-prod from that master pipeline.
Edited by Craig Furman