Forked repositories do not deduplicate LFS objects
Currently, GitLab is able to deduplicate forked repositories to save on storage costs by using shared pool repositories. (Note, there is a bug in measurement which is being fixed here: #368150 (comment 1054628587)) Unfortunately, this functionality does not apply to LFS objects today.
We should handle LFS objects ideally similarly to repository storage:
- We should only measure the changed LFS objects of the new project to count against their storage limit
- We should ideally deduplicate the actual stored LFS content, so we do not replicate it for every fork. This could also accelerate the process of forking, along with reducing cost. (Similar to pool repositories)
How others handle LFS
It looks like GitHub attributes the forked project's changed LFS objects back to the parent project: https://docs.github.com/en/repositories/working-with-files/managing-large-files/collaboration-with-git-large-file-storage. I'm not entirely sure why, perhaps related to how they charge for LFS.
Edited by Ian Pedowitz