Skip to content

quarantine: Implement Git quarantine object directories

Patrick Steinhardt requested to merge pks-git-quarantine into master

When receiving objects via git-receive-pack(1), Git will set up a temporary quarantine object directory. This directory serves as a kind of staging area: only when the push gets accepted will the objects be migrated from the quarantine object directory into the main repository. This has the advantage that we don't end up with potentially corrupt or otherwise inacceptable objects in the repository compared to if we were to directly accept these objects into the main object database.

At GitLab, the object quarantine directory has a second major purpose, which is performance for access checks. When calling out to Rails' access checks, Rails needs to somehow determine which objects are new. This is typically done via a query like e.g. git rev-list $NEWOIDS --not --all, but it has the downside of scaling with the number of preexisting references. We have thus landed optimizations where we instead iterate through objects in the quarantine directory to be able to tell which objects are new, which now scales with the number of new objects.

While these performance improvements are significant, they only apply in the context of pushes because we have no object quarantine directory if creating objects for example via OperationService RPCs. This is why we now implement the logic to manually set up quarantine directories for such RPCs in the form of a new quarantine package.

Quarantine directories are in fact quite simple implementation wise: Git simply creates a temporary object directory at a specific place and then sets up the GIT_OBJECT_DIRECTORY environment variable to point to this alternate object directory, where GIT_ALTERNATE_OBJECT_DIRECTORIES is adjusted to point at the main repository. The result is that all newly created objects will end up in this temporary directory. After all objects have been written, we simply migrate these objects by linking or renaming them into the correct place.

The quarantine package implements essentially the same: we create a temporary quarantine directory via the tempdir package, create a quarantined Repository protobuf which has above variables set to the correct locations. In the end, a call to Migrate() will migrate all objects of the tempdir into the main repository and thus wrap things up. Everything else is already handled by our Git ExecCommandFactory, which already knows to set the required envvars.

As an exemplary RPC, this MR converts UserCreateTag to use an object quarantine directory.

Part of #3691 (closed)

Merge request reports