Recommended Sidekiq configuration for temporary files
Summary
Some queues processed by sidekiq do not clean up after themselves. This may leave sidekiq Pods with an unnecessarily large amount of disk space usage after some amount of time has passed. Here's a hypothetical example:
For GitLab.com, we have a fleet of sidekiq servers dedicated to processing imports of various types. We run the following 2 queues on this particular fleet of servers:
- github_importer
- repository_import
When an import is complete, we do not appear to clean up the data when we are done with it. gitlab-com/gl-infra/production#1526 (closed)
What is the recommended way to handle this in a helm installation of GitLab? This is fine for a small installation where one is using sidekiq in the all-in-one method and where sidekiq never scales. But this is not the recommended install method. And unless Pods are constantly being cycled, eventually, the underlying node will run itself out of space.
While working on this, for a different queue, the Infrastructure team decided to dedicated an EmptyDir configuration mounted at the location where temporary data is to exist: gitlab-com/gl-infra/k8s-workloads/gitlab-com!105 (merged)
But this is only a stop gap solution that is not very well proven and will contain other consequences.