increase redis connection timeout
Production Change - Criticality 4 C4
| Change Objective | Describe the objective of the change |
|---|---|
| Change Type | ConfigurationChange |
| Services Impacted | redis |
| Change Team Members | @mwasilewski-gitlab @igorwwwwwwwwwwwwwwwwwwww |
| Change Criticality | C4 |
| Change Reviewer or tested in staging | A colleague who will review the change or evidence the change was tested on staging environment |
| Dry-run output | If the change is done through a script, it is mandatory to have a dry-run capability in the script, run the change in dry-run mode and output the result |
| Due Date | 2020-03-25 10:40:00 UTC |
| Time tracking | To estimate and record times associated with changes ( including a possible rollback ) |
This is part of the ongoing investigation: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/9420
The purpose of this change is to confirm a hypothesis that there is a race condition between idle connections timing out on the redis server side and 1 minute bursts of operations from puma threads
Pre
> config get timeout
1) "timeout"
2) "60"
self-documented, reference redis config file: https://raw.githubusercontent.com/antirez/redis/5.0/redis.conf
Detailed steps for the change
$ ssh redis-cache-01-db-gprd.c.gitlab-production.internal$ export REDIS_MASTER_AUTH=$(sudo grep ^masterauth /var/opt/gitlab/redis/redis.conf|cut -d\" -f2); /opt/gitlab/embedded/bin/redis-cli -a $REDIS_MASTER_AUTH-
> config set timeout 120 -
> config set timeout 360 -
> config set timeout 600 -
> config set timeout 1200
Rollback steps
-
> config set timeout 60
Changes checklist
-
Detailed steps and rollback steps have been filled prior to commencing work -
Person on-call has been informed prior to change being rolled out
Edited by Michal Wasilewski