increase redis connection timeout

Production Change - Criticality 4 C4

Change Objective Describe the objective of the change
Change Type ConfigurationChange
Services Impacted redis
Change Team Members @mwasilewski-gitlab @igorwwwwwwwwwwwwwwwwwwww
Change Criticality C4
Change Reviewer or tested in staging A colleague who will review the change or evidence the change was tested on staging environment
Dry-run output If the change is done through a script, it is mandatory to have a dry-run capability in the script, run the change in dry-run mode and output the result
Due Date 2020-03-25 10:40:00 UTC
Time tracking To estimate and record times associated with changes ( including a possible rollback )

This is part of the ongoing investigation: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/9420

The purpose of this change is to confirm a hypothesis that there is a race condition between idle connections timing out on the redis server side and 1 minute bursts of operations from puma threads

Pre

> config get timeout
1) "timeout"
2) "60"

self-documented, reference redis config file: https://raw.githubusercontent.com/antirez/redis/5.0/redis.conf

Detailed steps for the change

  • $ ssh redis-cache-01-db-gprd.c.gitlab-production.internal
  • $ export REDIS_MASTER_AUTH=$(sudo grep ^masterauth /var/opt/gitlab/redis/redis.conf|cut -d\" -f2); /opt/gitlab/embedded/bin/redis-cli -a $REDIS_MASTER_AUTH
  • > config set timeout 120
  • > config set timeout 360
  • > config set timeout 600
  • > config set timeout 1200

Rollback steps

  • > config set timeout 60

Changes checklist

  • Detailed steps and rollback steps have been filled prior to commencing work
  • Person on-call has been informed prior to change being rolled out
Edited by Michal Wasilewski