Skip to content

`git_repo` cannot handle tracking a branch of a large repo

Background

The stable Linux kernel repo has a special branch named linux-rolling-stable that, basically, contains the latest version of the kernel that the maintainers deem stable enough to be "ready" for everyday use by distros. Usually this means that when a new version of the kernel comes out (i.e. 6.3.0), linux-rolling-stable will stay on the previous release for a few versions. The number of releases that linux-rolling-stable "lags" isn't necessarily consistent. In short, the branch will look something like this: ... -> 6.2.16 -> 6.3.2 -> ... -> 6.3.25 -> 6.4.3 -> ... -> 6.4.18 -> 6.5.2 -> ...

To follow the linux-rolling-stable branch, I have the following source set up:

- kind: git_repo
  url: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
  track: refs/heads/linux-rolling-stable
  ref-format: git-describe

Issue 1: Fails to fetch tags

I expect to see a ref like this in my project.refs:

ref: v6.4.9-1-gf81dd77a54dc1b07631bd269cfc001434e4cf5b8

The linux-rolling-stable branch has the commit of the v6.4.9 tag, plus one more commit which is simply a merge commit (to merge the v6.4.9 tag into the linux-rolling-stable branch). Unfortunately, I get the following error:

[--:--:--][7516467c][   track:boards/amd64/kernel/linux.bst ] STATUS  Tracking linux-rolling-stable from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[--:--:--][7516467c][   track:boards/amd64/kernel/linux.bst ] STATUS  Tracked refs/heads/linux-rolling-stable: f81dd77a54dc1b07631bd269cfc001434e4cf5b8
[--:--:--][7516467c][   track:boards/amd64/kernel/linux.bst ] START   Fetching from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[--:--:--][7516467c][   track:boards/amd64/kernel/linux.bst ] STATUS  Fetching f81dd77a54dc1b07631bd269cfc001434e4cf5b8
[--:--:--][7516467c][   track:boards/amd64/kernel/linux.bst ] STATUS  Fetching 1116 extra tags
[00:06:09][7516467c][   track:boards/amd64/kernel/linux.bst ] FAILURE Fetching from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[00:06:09][7516467c][   track:boards/amd64/kernel/linux.bst ] FAILURE failed to fetch: unexpected http resp 413 for https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/git-upload-pack

    Printing the last 20 lines from log file:
    /var/home/adrian/.cache/buildstream/logs/carbonOS/boards-amd64-kernel-linux/7516467c-track.1133740.log
    ======================================================================
    BuildStream 2.0.1 - Wednesday, 09-08-2023 at 18:29:33
    [--:--:--] START   [7516467c] boards/amd64/kernel/linux.bst: Track
    [--:--:--] STATUS  boards/amd64/kernel/linux.bst: Tracking linux-rolling-stable from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
    [--:--:--] STATUS  boards/amd64/kernel/linux.bst: Tracked refs/heads/linux-rolling-stable: f81dd77a54dc1b07631bd269cfc001434e4cf5b8
    [--:--:--] START   boards/amd64/kernel/linux.bst: Fetching from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
    [--:--:--] STATUS  boards/amd64/kernel/linux.bst: Fetching f81dd77a54dc1b07631bd269cfc001434e4cf5b8
    [--:--:--] STATUS  boards/amd64/kernel/linux.bst: Fetching 1116 extra tags
    [00:06:09] FAILURE boards/amd64/kernel/linux.bst: Fetching from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
    [00:06:09] FAILURE [7516467c] boards/amd64/kernel/linux.bst: failed to fetch: unexpected http resp 413 for https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/git-upload-pack
    ======================================================================

Track failure on element: boards/amd64/kernel/linux.bst

If I hit retry on the error, it'll track successfully, but end up with a different ref:

ref: f81dd77a54dc1b07631bd269cfc001434e4cf5b8

I suspect the change in behavior is because the GitMirror.fetch function has a guard condition that checks if the commit is already in the repo. The first time around it isn't, so we continue into the function and try to fetch the commit (which succeeds) and tags (which errors out). The second time around, we already have the commit, so GitMirror.fetch exits early. Then later when git_describe is called, it doesn't find any tags, and thus just returns the commit hash.

Issue 2: Performance!

On my machine, tracking refs/heads/linux-rolling-stable takes ~6 minutes (just on the first call to client.fetch!), whereas tracking & fetching refs/tags/v* takes ~1 (for the whole thing!). I suspect this is because we don't have any depth information available in the branch case. Let me clarify:

When we're tracking a tag, we set the ref to {tag}-0-g{sha} and do no fetching. At fetch time, that 0 value is interpreted as a depth of 0. So, we only fetch the one commit specified by sha during the track and 0 of its parent commits. Since we tracked a tag, we've immediately got a tag available to us here. We're done.

When we're tracking a branch, however, we only have the sha hash. We have no depth information available, so when we try to fetch this commit specified by sha we pull down ALL the parent commits of this one. In the Linux kernel's case, this is a huge number of commits! This is what takes 6 minutes. Later we decide which tags we want to pull down. This is a huge number of tags! We don't actually need any of these. This whole process ends up being quite wasteful...

Assuming the track has worked, we will end up with a ref in the same format as when tracking a tag. Thus, someone simply fetching will have no performance issues. They will download the commit specified by sha, plus enough commits to get the tag.

Workaround

For now my workaround looks like this:

- kind: git_repo
  url: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
  track: refs/tags/v*
  ref-format: git-describe
  exclude:
  - v*-rc*
  - v*.*.[01]

This solves both issues, and tries its best to emulate linux-rolling-stable's "lag". However, I would much prefer to rely on the work the kernel maintainers are already doing by tracking the linux-rolling-stable branch instead.

Proposed solution

I think it should be possible to solve both issues by specifying a depth to the GitMirror class (here) when tracking a branch. Maybe just a new config key added to the source, branch-depth? I know that the linux-rolling-stable branch is always going to be just 1 commit past a tag, so I would set branch-depth: 1 to avoid downloading the entire commit history of the current kernel release...

Thoughts?

Edited by Adrian Vovk