Corrective action: Increase open file limit for redis.
## Summary During https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14497 we attempted to increase the `redis-cache` maxclients setting to 60k. However this setting depends on the `Max open file` limit to match. Currently the redis process gets it's limits set by the `/opt/gitlab/embedded/bin/runsvdir-start` binary, which is called by the systemd `gitlab-runsvdir` service. In order to bump the `maxclients` setting we need to be able to modify and increase the limit being set by the `ulimit -n 50000` line on the runsvdir-start script. We proved this by manually modifying the script on a redis-cache secondary instance in gstg. See details at https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14497#note_1400579323 <!-- Give context for what problem this issue is trying to prevent from happening again. Provide a brief assessment of the risk (chance and impact) of the problem that this corrective action fixes, to assist with triage and prioritization. --> ## Related Incident(s) <!-- Note the originating incident(s) and link known related incidents/other issues. The relation will happen automatically if you are creating this issue from an incident, if this isn't done already please uncomment the following line: --> Originating issue(s): gitlab-com/gl-infra/production#ISSUE_ID ## Desired Outcome/Acceptance Criteria - Write a Chef recipe that allows to override the `Max Open Files` setting for the redis instance, by modifying the limit value on `/opt/gitlab/embedded/bin/runsvdir-start`. - Set an attribute override on the [gprd-base-db-redis-server-cache.json](https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/blob/master/roles/gprd-base-db-redis-server-cache.json) and increase the Max Open Files to 60k. - Apply the changes on the `redis-cache` instances on GPRD. <!-- How will you know that this issue is complete? If you have any initial thoughts on implementation details (e.g. what to do or not do, gotchas, edge cases etc.), please share them while they are fresh in your mind. --> ## Associated Services <!-- Apply the appropriate services associated with this corrective action if applicable. ~Service::SERVICE_NAME --> ## Corrective Action Issue Checklist * [ ] Link the incident(s) this corrective action arose out of * [ ] Give context for what problem this corrective action is trying to prevent from re-occurring * [ ] Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') * [ ] Assign a priority (this will default to 'Reliability::P4')
issue