Use delta islands on all repositories, independently of object deduplication
As discussed in https://gitlab.com/gitlab-org/gitlab-ce/issues/55754#note_129346383 I expect a major speedup from this. In my simulation the server side computation time for a clone of gitlab-ce went from 18 seconds to 6 seconds. Note that from a client perspective the clone takes longer than this but that has to do with other factors such as their network connection, and the speed at which git on their machine can process the incoming clone.
The next question is how would we deploy and use this. It's not enough to make the config change that adds delta islands. We must force Git to re-calculate the deltas (with the -f
flag on git repack
) after we configure the islands; otherwise they will never get created and there are no benefits.
So besides the config change, we also need to periodically run a git repack -A -f -d
. This is much more expensive than what we do now during housekeeping; delta compression uses a lot of CPU. Normally Git just re-uses existing delta chains.
I think we need to make it a periodic thing to do a forced repack because over time people may delete or force-push branches, which can then still lead to delta chain "pollution".
Maybe we can just modify the RepackFull RPC to do a forced re-compression, and simultaneously apply the delta island config. That RPC doesn't run often so we can hopefully get away with the extra cost of the re-compression. If we do it like this, the change would be in gitaly only.
Original text:
The way I understand the Git delta islands feature its purpose is to prevent delta chains which cross from "commits the end user is unlikely or guaranteed not to have access to" to "commits the end user wants". We've been talking about this in the context of pool repositories in the case of object deduplication.
I wonder if this would also be useful in ordinary repositories. We store lots of refs in repositories that the user normally won't fetch, such as tmp refs, keepalive refs, deployment refs. We could use delta islands to create an island containing the refs/(heads|tags)
namespace, because that is all that people normally fetch. What do you think @chriscool?