Checkoutless Extractor
To evade the hg update
bugs around subrepos (as tickled by the
hg-regress-patho
tests), and thus also elide the need for the "try
hard to fake an update" fallback code, the extractor no longer
checks out an entire revision to disk.
Instead, low-level VCS commands are used to collect a manifest for
the revision and to read blob contents. In the case of GitExtractor
this also lets us avoid calculating a sha1 of each file at every
revision, since the blob hashes that git ls-tree
reports are good
enough for the job. We can't do this for HgExtractor
, because the
hg manifest
hashes include metadata (at least the filename)
meaning that a file move will cause us to create a duplicate blob
even if the contents did not change; so instead we have to hg cat
each file twice, once to feed it to sha1.Hash
(in manifest()
) and
once to copy it into the blobfile (in catFile()
).