Increase TCP idle timeout on redis-cache nodes
This is a follow-up to #1831 (closed) to make the experimental change permanent.
C4
Production Change - Criticality 4Change Objective | Let idle client connections live for multiple minutes, so the every-minute workload burst does not have to create as many new connections, saving redis CPU time and memory churn. |
---|---|
Change Type | ConfigurationChange |
Services Impacted | redis-cache |
Change Team Members | @msmiley |
Change Criticality | C4 |
Change Reviewer or tested in staging | Tested on staging environment: #1874 (comment 314341173) |
Dry-run output | N/A |
Due Date | 2020-03-31 01:45 UTC (2020-03-30 18:45 PDT) |
Time tracking | 10 minutes (same to rollback) |
Detailed steps for the change
Pre-condition
The run-time setting is already 1200 seconds, but the config file is still 60 seconds. Note that the redis.conf file differs from the gitlab.rb file.
WARNING: DO NOT run "gitlab-ctl reconfigure", as it would cause redis to restart needlessly and cause downtime. The chef-client run will not run it for that very reason.
$ knife ssh 'roles:gprd-base-db-redis-server-cache' '~/gitlab-redis-cli.sh config get timeout'
redis-cache-03-db-gprd.c.gitlab-production.internal 1) "timeout"
redis-cache-03-db-gprd.c.gitlab-production.internal 2) "1200"
redis-cache-02-db-gprd.c.gitlab-production.internal 1) "timeout"
redis-cache-02-db-gprd.c.gitlab-production.internal 2) "1200"
redis-cache-01-db-gprd.c.gitlab-production.internal 1) "timeout"
redis-cache-01-db-gprd.c.gitlab-production.internal 2) "1200"
$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo grep "redis.*tcp_timeout" /etc/gitlab/gitlab.rb'
redis-cache-01-db-gprd.c.gitlab-production.internal redis['tcp_timeout'] = "60"
redis-cache-02-db-gprd.c.gitlab-production.internal redis['tcp_timeout'] = "60"
redis-cache-03-db-gprd.c.gitlab-production.internal redis['tcp_timeout'] = "60"
$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo grep "^timeout" /var/opt/gitlab/redis/redis.conf'
redis-cache-01-db-gprd.c.gitlab-production.internal timeout 1200
redis-cache-02-db-gprd.c.gitlab-production.internal timeout 60
redis-cache-03-db-gprd.c.gitlab-production.internal timeout 60
Change procedure
- Backup the redis.conf file for later comparison.
$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo cp -p /var/opt/gitlab/redis/redis.conf{,.backup}'
-
Run the
apply_to_prod
pipeline job for the merge request, which only updatesgitlab.rb
, notredis.conf
: https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/3020 -- relevant pipeline: https://ops.gitlab.net/gitlab-cookbooks/chef-repo/pipelines/128181 -
Run chef-client to update gitlab.rb. This does not have to complete before proceeding to the next step.
$ knife ssh -C 1 'roles:gprd-base-db-redis-server-cache' 'sudo chef-client'
- Run
CONFIG REWRITE
via redis-cli to updateredis.conf
to match the runtime state of the redis-server process.
$ knife ssh -C 1 'roles:gprd-base-db-redis-server-cache' '~/gitlab-redis-cli.sh config rewrite'
Validation
Verify the only change to redis.conf was the expected change to the timeout
setting.
$ knife ssh -C 1 'roles:gprd-base-db-redis-server-cache' 'sudo diff -U0 /var/opt/gitlab/redis/redis.conf{.backup,}'
Verify the Redis runtime setting for timeout
is still 1200 seconds, and verify the config files now agree:
-
gitlab.rb
(chef-managed) -
redis.conf
(gitlab-ctl-managed)
$ knife ssh 'roles:gprd-base-db-redis-server-cache' '~/gitlab-redis-cli.sh config get timeout'
$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo grep "redis.*tcp_timeout" /etc/gitlab/gitlab.rb'
$ knife ssh 'roles:gprd-base-db-redis-server-cache' 'sudo grep "^timeout" /var/opt/gitlab/redis/redis.conf'
Rollback steps
Since this aims to make the redis.conf file match the runtime state, no rollback should be needed, but if it is, the old redis.conf file can be restored.
Changes checklist
-
Detailed steps and rollback steps have been filled prior to commencing work -
Person on-call has been informed prior to change being rolled out