As noted in #1519 (comment 270779238), the chart is not able to use the Redis sentinel support when it deploys a Redis cluster. Sentinel support is still available if the Redis cluster is created separately from the GitLab helm chart.
There is a name difference in the Redis service when sentinel support is activated. Without sentinel support, the Kubernetes Service name is <RELEASE>-redis-master-0. When sentinel support is turned on the Service name becomes <RELEASE>-redis. When configuring the Redis endpoints within the GitLab sub-charts it is not possible to interrogate redis.sentinel.enabled setting to determine which Service name to use.
At this point one option is to create a global helm setting that is unsynchronized with redis.sentinel.enabled to allow for the correct Service name to be chosen. This will probably lead to misconfigurations and problematic GitLab installs unless a mechanism can be found to synchronize or validate the configuration during installation and upgrades.
Another option that may work, but has not been proven yet, is to use redis.nameOverride. This may allow the Service name to be specified for all configurations, but it is uncertain how many other objects will be affected with this setting.
Without sentinel support, the Kubernetes Service name is <RELEASE>-redis-master-0. When sentinel support is turned on the Service name becomes <RELEASE>-redis.
It seems it might be more straight forward to get the upstream Redis to render a service by the same name, at all times?
I did a quick inspection of the Redis chart, but it did not appear that this would be trivial thing to accomplish. Each of the templates in the Redis chart does use {{ template "redis.fullname" . }} to recreate the Service name, but then the master and slave Services hardcode -master and -slave suffixes.
Correction: the master Service name is <RELEASE>-redis-master. Not sure why I added the -0 in the description.
As shown in the Distribution demo on 2020-01-31, we can actually change the template to default to RELEASE-redis-headless, and have it work for both cases.
Here's how I currently make it work with 3.0.2 tag
# .Release.Name = `demo`global:redis:host:demo-redis-headless# aligns to redis.sentinel.masterSetsentinels:# these have to be manually defined-host:demo-redis-headlessport:26379redis:cluster:enabled:trueslaveCount:2# this results in 1+2 => 1+N, so we can get quorum for Sentinelsentinel:enabled:trueusePassword:false# we set this to match service name, for components not able to use sentinel# (may not be necessary)# defaults (upstream) to `mymaster`masterSet:demo-redis-headless
I retries this earlier today, and masterSet was useless for me. I had to use the following:
redis:cluster:enabled:truesentinel:enabled:trueusePassword:false# gitlab-workhorse does not support Sentinel authenticationglobal:redis:host:mymastersentinels:-host:gitlab-redis-headless
Trying out @WarheadsSE second recommended yaml I'm noticing an oddity/potential issue. It seems that none of the slaves will come up if the gitlab-redis-master-0 pod is down. Once the master is up the slaves come up fine and the master can later be taken down without an issue.
Tested this by having 1 master and 3 slaves:
scaled the gitlab-redis-master statefulset down to 0
deleted gitlab-redis-slave-2 pod (sentinel said it was not the current master)
kubernetes attempted to auto bring back the slave pod but it went to a crash loop back off
scaled the master replica set back up to 1 which recreated gitlab-redis-master-0 and shortly there after my gitlab-redis-slave-2 came up fully
I also went ahead and changed the default timeout values because waiting a minute for sentinel to detect that the node with role=master was getting annoying.
I have finally finished my testing of using this chart with redis sentinel.
This is the list of problems I faced:
Electing master with address 127.0.0.1:6379 - fixed in newer versions of bitnami chart.
Split-braining sentinel sometimes, new/restarted sentinels sometime wasn't able to take a part of current cluster and started to work as standalone.
Failover is not performed automatically during graceful shutdown of current master pod. (It's performed later when cluster saw that master is unavailable more then given period).
I even started PR for that but still not finished - https://github.com/bitnami/charts/pull/2820 .
For now I am using spotahome redis operator for my gitlab installation, and it works with much less problems then bitnami redis chart.
Actually it's very strange to use sentinel with operator (cause it seems that operator should replace sentinel, because they both implemented to control livecycle of redis cluster), but it works.
Actually it's very strange to use sentinel with operator (cause it seems that operator should replace sentinel, because they both implemented to control livecycle of redis cluster), but it works.
I'll have to look into this operator to see exactly how it is behaving.
Sentinel is a known, tested, and common pattern for Redis that acts as a balancer in front of Redis (replicas) in order to create High Availablity and handle failover.
Cluster is a means of sharding across multiple primaries, which gains resiliency in the event of partial cluster loss. Combining the two is how you can get spread and HA, but it's not very straight forward from an service implementation and client perspective, as there are several complications that many client implementations won't understand.
Bitnami actually has two separate charts for these bitnami/redis and bitnami/redis-cluster. Unfortunately the use of the term and properties (cluster.x) named "cluster" confuses the two actual implementations.
Just to make things clear:
When I used "cluster" word in previous message I meant redis sentinel instances and not redis cluster (multi-master sharding).
Also spotahome redis operator is able only to work with sentinel and as far as i know it doesn't supports redis cluster mode.
Contributions like this are vital to help make GitLab a better product.
We would be grateful for your help in verifying whether your bug report requires further attention from the team. If you think this bug still exists, and is reproducible with the latest stable version of GitLab, please comment on this issue.
This issue has been inactive for more than 12 months now and based on the policy for inactive bugs, will be closed in 7 days.
Thanks for your contributions to make GitLab better!