Upgrade Redis to the latest version shipped in Omnibus
While attempting to disable persistence on the Redis cache nodes (see https://gitlab.com/gitlab-com/infrastructure/issues/2448#note_48616410) it became clear that we are unable: the setting that need to be tweaked is only present in Omnibus >= 10.2 and we're still running 10.0.1 on all Redis/Sentinel nodes.
We could fix this easily: just bump the version pinning up to 10.3.3 and let Chef do the honours.
BUT
That would trigger a reconfigure (read: a Redis restart) on all Redis nodes within a window of 10 minutes, including the persistent nodes and the cache sentinels. Also, it will install Redis 3.2.11 while we're currently running 3.2.5 so we need to ensure there are no surprises in the changelog.
So this operation will require two more failovers, one for the persistent cluster (relatively easy now that the dataset is under control) and one for the cache (not as easy).
Here's the plan in detail, assuming that redis01 and redis-cache-01 are the current masters:
- Confirm that we won't be impacted by the upgrade. This may seem like an overkill as the version currently running in production is compatible with the Redis version bundled in Omnibus but better be safe than sorry.
- Stop chef-client on every redis and sentinel node.
- Upgrade
redis-cache-03(30 mins) andredis03(this will also upgrade sentinel). - Upgrade
redis-cache-02(30 mins) andredis02(this will also upgrade sentinel). - Upgrade the sentinels for cache one at a time.
- On maintenance day, failover both persist and cache.
- Upgrade
redis-cache-01(30 mins) andredis-01(this will also upgrade sentinel).
An alternative approach could be to move the Omnibus version block to each role but that will inevitably increase version entropy, which will put other "surprises" around the corner in the coming months.
/cc @gl-infra and @edjdev for awareness.