Storage measurement and management improvements
Overview
Over time, a GitLab instance can generate a significant amount of storage. This includes activities like pushing code, creating new containers and packages, running CI/CD jobs, and more. This storage increases the cost of operating a GitLab instance, whether it is self-managed or a large instance like GitLab.com.
Providing tools to both administrators as well as users of the service is important so that storage usage can be understood and optimized.
Measurement
Enabling administrators and users to understand their storage usage is important in knowing how to best optimize it. We are actively working on improving how GitLab the application measures storage usage, and aware of a few challenges.
- Ensure namespace-level aggregate storage is accurate: https://gitlab.com/groups/gitlab-org/-/epics/8542
- Improve performance of registry measurement for namespaces with large amounts of images: &9415, &9105
- Investigate potential LFS duplication: https://gitlab.com/gitlab-org/gitlab/-/issues/372534
- Fix Package storage measurements: #363010 (closed), #368327 (closed), https://gitlab.com/groups/gitlab-org/-/epics/8627 (internal only)
-
Fix Artifact storage measurements: https://gitlab.com/groups/gitlab-org/-/epics/8627 (internal only) - Do not count pipeline artifacts for now: https://gitlab.com/groups/gitlab-org/-/epics/10672
- Fix ability to reduce git storage: gitaly#4824 (closed), https://gitlab.com/gitlab-org/gitlab/-/issues/351415
- Do not count projects marked for deletion towards usage quota: https://gitlab.com/gitlab-org/gitlab/-/issues/370178
- Provide a mechanism to reduce the storage contribution of forks: https://gitlab.com/gitlab-org/gitlab/-/issues/373914, !123981 (merged)
- Fix storage usage page to ensure it loads when dependency proxy is used: #412974 (closed)
- Utilize new storage size calculation for git repositories: #418243 (closed)
Management
Providing tools to our users is important so that they can effectively and easily manage storage. Often it is best to support some type of automatic expiration of content, for example what we have today with container images and build artifacts, so that old and no longer useful content is removed.
GitLab supports tools to manage most storage types today, but we are working on a few improvements:
-
Improve workflow for managing build artifacts:&8715- Build artifacts have their expiration settings, as well as can be deleted in bulk via API or in batches of 50 via UI.
-
Introduce expiration setting for job logs, which count towards Artifact storage: #374717- Note not required, as pipelines can be deleted which cleans up job logs via API or UI.
- Fix bug regarding "keep latest" with build artifacts: #266958 (closed)