Skip to content
GitLab
Next
    • GitLab: the DevOps platform
    • Explore GitLab
    • Install GitLab
    • How GitLab compares
    • Get started
    • GitLab docs
    • GitLab Learn
  • Pricing
  • Talk to an expert
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    Projects Groups Topics Snippets
  • Register
  • Sign in
  • GitLab GitLab
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 55k
    • Issues 55k
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1.6k
    • Merge requests 1.6k
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
    • Model experiments
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • GitLabGitLab
  • Issues
  • #36628
Closed
Open
Issue created Nov 18, 2019 by Fabian Zimmer@fzimmer🌿Developer

Create a rake task to cleanup unused LFS files

Problem to solve

Users or admins run up against storage limits, and realize that they are using a lot of storage on LFS objects they no longer want or need. Then they find there is no way to remove them from the GitLab-managed LFS storage without deleting the project.

Intended users

  • Sidney (Systems Administrator)

Further details

The Git repo is the SSOT, but the data is elsewhere, only tracked by project. So in order to know if you can delete the data, you have to scan the repo or have been tracking the pointers the whole time.

Ideal solution

Track pointers the whole time and automatically delete objects when unreferenced. It is technically possible, and it is the most user-friendly solution. We started down this road in gitlab-foss!14479 (closed). But this is a large, complex MR, and from my limited understanding, more than weight 5. Especially including performance validation etc. There are apparently a lot of performance risks and pitfalls. Also it looks like existing repos weren't handled yet.

Proposal

Rather than focusing on the ideal solution first, we can iterate using a boring solution: Create a rake task to clean a single project in a non-performant way. Note that this would also allow us to validate our performance concerns. Maybe from there we could implement a way for project Maintainers to queue a clean, and run those single file. It's not ideal, but users would at least have some recourse. It is similar to how we handle other cleanup tasks.

This was also proposed already in this comment: #8922 (comment 244261646)

The rake task could be sudo gitlab-rake gitlab:cleanup:lfs_files.

Permissions and Security

  • I believe this requires sudo access

Documentation

  • Add to https://docs.gitlab.com/ee/raketasks/cleanup.html

Testing

  • Needs to be well tested because it otherwise may remove still needed data

What does success look like, and how can we measure that?

A systems administrator can schedule to sudo gitlab-rake gitlab:cleanup:lfs_files on a per project basis e.g. over the weekend to remove LFS files that are no longer required.

What is the type of buyer?

  • All tiers

Links / references

  • #17711 (closed)
  • Release Post MR: gitlab-com/www-gitlab-com!35814 (merged)
Edited Dec 09, 2019 by Michael Kozono
Assignee
Assign to
Time tracking