Create a rake task to cleanup unused LFS files
Problem to solve
Users or admins run up against storage limits, and realize that they are using a lot of storage on LFS objects they no longer want or need. Then they find there is no way to remove them from the GitLab-managed LFS storage without deleting the project.
The Git repo is the SSOT, but the data is elsewhere, only tracked by project. So in order to know if you can delete the data, you have to scan the repo or have been tracking the pointers the whole time.
Track pointers the whole time and automatically delete objects when unreferenced. It is technically possible, and it is the most user-friendly solution. We started down this road in gitlab-foss!14479 (closed). But this is a large, complex MR, and from my limited understanding, more than weight 5. Especially including performance validation etc. There are apparently a lot of performance risks and pitfalls. Also it looks like existing repos weren't handled yet.
Rather than focusing on the ideal solution first, we can iterate using a boring solution: Create a rake task to clean a single project in a non-performant way. Note that this would also allow us to validate our performance concerns. Maybe from there we could implement a way for project Maintainers to queue a clean, and run those single file. It's not ideal, but users would at least have some recourse. It is similar to how we handle other cleanup tasks.
This was also proposed already in this comment: #8922 (comment 244261646)
The rake task could be
sudo gitlab-rake gitlab:cleanup:lfs_files.
Permissions and Security
- I believe this requires
- Needs to be well tested because it otherwise may remove still needed data
What does success look like, and how can we measure that?
A systems administrator can schedule to
sudo gitlab-rake gitlab:cleanup:lfs_files on a per project basis e.g. over the weekend to remove LFS files that are no longer required.
What is the type of buyer?
- All tiers