Skip to content

Increase TCP idle timeout on redis-cache nodes

This is a follow-up to #1831 (closed) to make the experimental change permanent.

Production Change - Criticality 4 C4

Change Objective Let idle client connections live for multiple minutes, so the every-minute workload burst does not have to create as many new connections, saving redis CPU time and memory churn.
Change Type ConfigurationChange
Services Impacted redis-cache
Change Team Members @msmiley
Change Criticality C4
Change Reviewer or tested in staging Tested on staging environment: #1874 (comment 314341173)
Dry-run output N/A
Due Date 2020-03-31 01:45 UTC (2020-03-30 18:45 PDT)
Time tracking 10 minutes (same to rollback)

Detailed steps for the change

Pre-condition

The run-time setting is already 1200 seconds, but the config file is still 60 seconds. Note that the redis.conf file differs from the gitlab.rb file.

WARNING: DO NOT run "gitlab-ctl reconfigure", as it would cause redis to restart needlessly and cause downtime. The chef-client run will not run it for that very reason.

$ knife ssh 'roles:gprd-base-db-redis-server-cache' '~/gitlab-redis-cli.sh config get timeout'
redis-cache-03-db-gprd.c.gitlab-production.internal 1) "timeout"
redis-cache-03-db-gprd.c.gitlab-production.internal 2) "1200"
redis-cache-02-db-gprd.c.gitlab-production.internal 1) "timeout"
redis-cache-02-db-gprd.c.gitlab-production.internal 2) "1200"
redis-cache-01-db-gprd.c.gitlab-production.internal 1) "timeout"
redis-cache-01-db-gprd.c.gitlab-production.internal 2) "1200"

$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo grep "redis.*tcp_timeout" /etc/gitlab/gitlab.rb'
redis-cache-01-db-gprd.c.gitlab-production.internal redis['tcp_timeout'] = "60"
redis-cache-02-db-gprd.c.gitlab-production.internal redis['tcp_timeout'] = "60"
redis-cache-03-db-gprd.c.gitlab-production.internal redis['tcp_timeout'] = "60"

$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo grep "^timeout" /var/opt/gitlab/redis/redis.conf'
redis-cache-01-db-gprd.c.gitlab-production.internal timeout 1200
redis-cache-02-db-gprd.c.gitlab-production.internal timeout 60
redis-cache-03-db-gprd.c.gitlab-production.internal timeout 60

Change procedure

  • Backup the redis.conf file for later comparison.
$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo cp -p /var/opt/gitlab/redis/redis.conf{,.backup}'
$ knife ssh -C 1 'roles:gprd-base-db-redis-server-cache' 'sudo chef-client'
  • Run CONFIG REWRITE via redis-cli to update redis.conf to match the runtime state of the redis-server process.
$ knife ssh -C 1 'roles:gprd-base-db-redis-server-cache' '~/gitlab-redis-cli.sh config rewrite'

Validation

Verify the only change to redis.conf was the expected change to the timeout setting.

$ knife ssh -C 1 'roles:gprd-base-db-redis-server-cache' 'sudo diff -U0 /var/opt/gitlab/redis/redis.conf{.backup,}'

Verify the Redis runtime setting for timeout is still 1200 seconds, and verify the config files now agree:

  • gitlab.rb (chef-managed)
  • redis.conf (gitlab-ctl-managed)
$ knife ssh 'roles:gprd-base-db-redis-server-cache' '~/gitlab-redis-cli.sh config get timeout'

$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo grep "redis.*tcp_timeout" /etc/gitlab/gitlab.rb'

$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo grep "^timeout" /var/opt/gitlab/redis/redis.conf'

Rollback steps

Since this aims to make the redis.conf file match the runtime state, no rollback should be needed, but if it is, the old redis.conf file can be restored.

Changes checklist

  • Detailed steps and rollback steps have been filled prior to commencing work
  • Person on-call has been informed prior to change being rolled out
Edited by Matt Smiley