Replace zlib with zstd for compression in Git
Some old thread and POC patches from Peff could be found here
To quote Peff:
So saving 10% here really _isn't_ that interesting. I mostly wanted to confirm that we could use zstd without increasing the CPU time used for deflating, so that we could reap the benefits on the inflate side. Which is definitely the case. With these numbers, there's basically no downside at all to using zstd. It's just faster to read the objects later. If we were designing git today, it seems like a no-brainer to use zstd over zlib. But given backwards-compatibility issues, I'm not sure. 10-20% speedup on reading is awfully nice, but I don't think there's a good way to gracefully transition, because zlib is part of the on-the-wire format for serving objects. We could re-compress on the fly, but that gets expensive (in existing cases, we can quite often serve the zlib content straight from disk, but this would require an extra inflate/deflate. At least we wouldn't have to reconstitute objects from deltas, though). A transition would probably look something like: 0. The patch below, or something like it, to teach git to read both zlib and zstd, and optionally write zstd. We'd probably want to make this an unconditional requirement like zlib, because the point is for it to be available everywhere (I assume the zstd code is pretty portable, but I haven't put it to the test). 1. Another patch to add a "zstd" capability to the protocol. This would require teaching pack-objects an option to convert zstd back to zlib on the fly. Servers which handle a limited number of updated clients can switch to zstd immediately to get the benefit, and their clients can handle it directly. Likewise, clients of older servers may wish to repack locally using zstd to get the benefit. They'll have to recompress on the fly during push, but pushes are rare than other operations (and often limited by bandwidth anyway). 2. After a while, eventually flip the default to zstd=5. 3. If "a while" is long enough, perhaps add a patch to let servers tell clients "go upgrade" rather than recompressing on the fly. I don't have immediate plans for any of that, but maybe something to think about.
It seems like there will be a 10-20% speed improvement on reading at no additional CPU cost when switch compression algorithm to ZStandard. But driving such 'migration' while ensuring backward compatible with existing clients that uses Zlib would be hard.
Perhaps this is something Gitlab folks might want to tackle?