[Discussion] Improve redis.yml robustness
Problem
The creation of config/redis.yml
allowed us to insert new Redis configurations without going through charts and omnibus-gitlab. While this speeds up provisioning work, it adds a layer of failure to our deployment process. This brittleness has been exposed several times:
- during the
redis-chat-cache
provisioning, we had to patch charts (gitlab-org/charts/gitlab!3188 (merged)) to handle passwords. - during the rollout of &1010 and &979 (closed), the deployer node failed to run database migrations as the password was defined in the omnibus vault secrets but the config for the respective cache was not updated (chef-client has not run)
- in the recovery stages of production#15997 (closed), the console was not able to run because the password was removed slightly after the revert MR was merged.
How is omnibus-gitlab brittle because of redis.yml?
If "omnibus-gitlab".gitlab_rb."gitlab-rails".redis_yml_override.<redis instance name>.password
is defined, gitlab-vault
loads it into the node
object as node['redis_yml_override'][<redis instance name>]['password']
.
This means node['redis_yml_override'][<redis instance name>]
is present, causing it to be loaded wholesale into the redis.yml
.
production:
<xxx>: # this is invalid as it only has password
password: REDACTED
<other instances>:
username: ...
cluster: ...
Since cluster
does not exist either, the application attempts to parse the url
key, ' nil`. This raises an error and the application fails to start up.
Discussion
Defining a password
in the vault without config changes in chef-repo should not break our deployments. A naive fix on omnibus would be to remove keys where only password
is defined but that seems oddly specific to our use case.
We could fix it on the application side to fallback to the next config if the config is invalid but that would introduce complexity into a component that should be kept relatively simple (the application should decide on a config file to read from, read it, and raise an error if it is invalid).