Sidekiq queues not cleaning up temporary data when job is complete or failed
Summary
Some sidekiq jobs do not appear to properly clean up data gathered for processing when the job completes. Whether the job is successful or not, the data left behind will eventually fill the disk. For GitLab.com this has historically been resolved by having a massive NFS mount to the shared directory, and the appropriate cron job will perform the cleanup which applies to all sidekiq servers since the disk mount is shared between all servers. This creates at least two issues:
- Due to performance issues, the Infrastructure team has been trying to remove all NFS mounts. With this removal, the sidekiq-import servers are slowly filling up their disk.
- As we make the move over to Kubernetes, this will be more difficult. All Pods do not run the clean up, and eventually those Pods will run the underlying node out of disk space.
Jobs should clean themselves after the data processing has completed. This may require further investigation, however, the two identified queues that run our GitLab.com sidekiq-import fleet are the following:
- github_importer
- repository_import
After a project import has been completed, data is left around on disk, which has caused deployment issues and will be an issue when these queues are migrated over into kubernetes. See the related issues below for more context and details.
Steps to reproduce
-
View disk space utilized on #{GitlabInstallPath}/gitlab-rails/shared/tmp
-
Perform a project import -
View disk space utilized on #{GitlabInstallPath}/gitlab-rails/shared/tmp
What is the current bug behavior?
Data is left behind on disk after the job completes (successful or not)
What is the expected correct behavior?
Jobs should clean up the data
Relevant logs and/or screenshots
Related issue for the GitLab.com Infrastructure team: gitlab-com/gl-infra/production#1526 (closed)
Desired production change to remove a large shared file system from all sidekiq servers: gitlab-com/gl-infra/production#1238 (closed)
Related issue for our Helm chart installation method: gitlab-org/charts/gitlab#1785
Output of checks
This bug happens on GitLab.com