Assess using a redis cluster for Container Registry to free up compute resources
The Container Registry stores blob information in memory. This is problematic as that is cachable information. This cache is essentially stored in memory until the service is stopped. This means that the Container Registry will always present itself as a memory leaking application until we utilize some other form of storing the cachable data. Please note that we've been operating this way for many years. For the most part this does not seem like a big issue, therefore the priority of this is relatively low.
Historically, when the Container Registry was running on VM's, we had a cron job that would intentionally kill the services. Now that we are on Kubernetes, we are leveraging Kubernetes resource limits to control memory usage. Currently this is set to 4GB. We should attempt to make this better as this isn't necessarily a good solution towards a service that would operate at scale. This will start to become more important as time goes on and GitLab grows. Let's consider implementing some form of external cache storage to the Container Registry to free up compute resources at the application level and shift that work to a better suited, centralized, and shared service.
Utilize this issue to determine a few things prior to implementation:
- Is redis is our only option? Are there other technologies better suited for this?
- Can re-use our existing redis clusters?
-
Does the Container Registry supports sentinelsif we re-used existing clusters? - Evaluate what the data looks like over the course of time would be wise. We need to know what limitations to set on our key expiration if any, and our max memory profile in order to provision the appropriate resources.
- Do we need to back up this data, or if we can live with a completely dropped database in the case of failed failover attempts on the cluster?
- etc