Deduplication friendly artifacts storage.

Description

I use gitlab artifacts to store release versions of a software. The artifacts package contains everything needed to get an application up and running. This includes a large amount of shared libraries (80%) that don't change often. But for every release they are stored as a single zip package inside /var/opt/gitlab/gitlab-rails/shared/artifacts/

Every release has about 180MB compressed and 500MB uncompressed size.

Proposal

Deduplication friendly artifacts storage.

Allow to configure the zip level for artifacts storage (Artifacts should be copied with compression from the runner to the gitlab server) but it would be great to define the zip level for the stored artifacts. This would allow me to set the zip level to zero (uncompressed).

While storing 500MB instead of 180MB sounds like a bad idea at first this makes sense because our gitlab instance lies on a deduplication storage. Since the file system size would increase the space the file system requires on the deduplication storage would decrease.

Not everybody has a deduplication storage. However, there are standalone file systems with deduplication support like btrfs or opendedup so even small teams could benefit.

Many backup solutions perform deduplication, too. So this feature could decrease backup size, too.

Links / references

https://btrfs.wiki.kernel.org/index.php/Deduplication
http://opendedup.org/odd/2016/09/15/filesystem/