`git_repo` cannot handle tracking a branch of a large repo
Background
The stable Linux kernel repo has a special branch named linux-rolling-stable
that, basically, contains the latest version of the kernel that the maintainers deem stable enough to be "ready" for everyday use by distros. Usually this means that when a new version of the kernel comes out (i.e. 6.3.0
), linux-rolling-stable
will stay on the previous release for a few versions. The number of releases that linux-rolling-stable
"lags" isn't necessarily consistent. In short, the branch will look something like this: ... -> 6.2.16
-> 6.3.2
-> ... -> 6.3.25
-> 6.4.3
-> ... -> 6.4.18
-> 6.5.2
-> ...
To follow the linux-rolling-stable
branch, I have the following source set up:
- kind: git_repo
url: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
track: refs/heads/linux-rolling-stable
ref-format: git-describe
Issue 1: Fails to fetch tags
I expect to see a ref like this in my project.refs:
ref: v6.4.9-1-gf81dd77a54dc1b07631bd269cfc001434e4cf5b8
The linux-rolling-stable
branch has the commit of the v6.4.9
tag, plus one more commit which is
simply a merge commit (to merge the v6.4.9
tag into the linux-rolling-stable
branch). Unfortunately, I get the following error:
[--:--:--][7516467c][ track:boards/amd64/kernel/linux.bst ] STATUS Tracking linux-rolling-stable from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[--:--:--][7516467c][ track:boards/amd64/kernel/linux.bst ] STATUS Tracked refs/heads/linux-rolling-stable: f81dd77a54dc1b07631bd269cfc001434e4cf5b8
[--:--:--][7516467c][ track:boards/amd64/kernel/linux.bst ] START Fetching from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[--:--:--][7516467c][ track:boards/amd64/kernel/linux.bst ] STATUS Fetching f81dd77a54dc1b07631bd269cfc001434e4cf5b8
[--:--:--][7516467c][ track:boards/amd64/kernel/linux.bst ] STATUS Fetching 1116 extra tags
[00:06:09][7516467c][ track:boards/amd64/kernel/linux.bst ] FAILURE Fetching from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[00:06:09][7516467c][ track:boards/amd64/kernel/linux.bst ] FAILURE failed to fetch: unexpected http resp 413 for https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/git-upload-pack
Printing the last 20 lines from log file:
/var/home/adrian/.cache/buildstream/logs/carbonOS/boards-amd64-kernel-linux/7516467c-track.1133740.log
======================================================================
BuildStream 2.0.1 - Wednesday, 09-08-2023 at 18:29:33
[--:--:--] START [7516467c] boards/amd64/kernel/linux.bst: Track
[--:--:--] STATUS boards/amd64/kernel/linux.bst: Tracking linux-rolling-stable from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[--:--:--] STATUS boards/amd64/kernel/linux.bst: Tracked refs/heads/linux-rolling-stable: f81dd77a54dc1b07631bd269cfc001434e4cf5b8
[--:--:--] START boards/amd64/kernel/linux.bst: Fetching from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[--:--:--] STATUS boards/amd64/kernel/linux.bst: Fetching f81dd77a54dc1b07631bd269cfc001434e4cf5b8
[--:--:--] STATUS boards/amd64/kernel/linux.bst: Fetching 1116 extra tags
[00:06:09] FAILURE boards/amd64/kernel/linux.bst: Fetching from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[00:06:09] FAILURE [7516467c] boards/amd64/kernel/linux.bst: failed to fetch: unexpected http resp 413 for https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/git-upload-pack
======================================================================
Track failure on element: boards/amd64/kernel/linux.bst
If I hit retry on the error, it'll track successfully, but end up with a different ref:
ref: f81dd77a54dc1b07631bd269cfc001434e4cf5b8
I suspect the change in behavior is because the GitMirror.fetch
function has a guard condition that checks if the commit is already in the repo. The first time around it isn't, so we continue into the function and try to fetch the commit (which succeeds) and tags (which errors out). The second time around, we already have the commit, so GitMirror.fetch
exits early. Then later when git_describe
is called, it doesn't find any tags, and thus just returns the commit hash.
Issue 2: Performance!
On my machine, tracking refs/heads/linux-rolling-stable
takes ~6 minutes (just on the first call to client.fetch
!), whereas tracking & fetching refs/tags/v*
takes ~1 (for the whole thing!). I suspect this is because we don't have any depth information available in the branch case. Let me clarify:
When we're tracking a tag, we set the ref to {tag}-0-g{sha}
and do no fetching. At fetch time, that 0 value is interpreted as a depth of 0. So, we only fetch the one commit specified by sha
during the track and 0 of its parent commits. Since we tracked a tag, we've immediately got a tag available to us here. We're done.
When we're tracking a branch, however, we only have the sha
hash. We have no depth information available, so when we try to fetch this commit specified by sha
we pull down ALL the parent commits of this one. In the Linux kernel's case, this is a huge number of commits! This is what takes 6 minutes. Later we decide which tags we want to pull down. This is a huge number of tags! We don't actually need any of these. This whole process ends up being quite wasteful...
Assuming the track has worked, we will end up with a ref in the same format as when tracking a tag. Thus, someone simply fetching will have no performance issues. They will download the commit specified by sha
, plus enough commits to get the tag.
Workaround
For now my workaround looks like this:
- kind: git_repo
url: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
track: refs/tags/v*
ref-format: git-describe
exclude:
- v*-rc*
- v*.*.[01]
This solves both issues, and tries its best to emulate linux-rolling-stable
's "lag". However, I would much prefer to rely on the work the kernel maintainers are already doing by tracking the linux-rolling-stable
branch instead.
Proposed solution
I think it should be possible to solve both issues by specifying a depth to the GitMirror
class (here) when tracking a branch. Maybe just a new config key added to the source, branch-depth
? I know that the linux-rolling-stable
branch is always going to be just 1 commit past a tag, so I would set branch-depth: 1
to avoid downloading the entire commit history of the current kernel release...
Thoughts?