Skip to content

Checkoutless Extractor

Edward Cree requested to merge ecree/reposurgeon:newextractor into master

To evade the hg update bugs around subrepos (as tickled by the hg-regress-patho tests), and thus also elide the need for the "try hard to fake an update" fallback code, the extractor no longer checks out an entire revision to disk.

Instead, low-level VCS commands are used to collect a manifest for the revision and to read blob contents. In the case of GitExtractor this also lets us avoid calculating a sha1 of each file at every revision, since the blob hashes that git ls-tree reports are good enough for the job. We can't do this for HgExtractor, because the hg manifest hashes include metadata (at least the filename) meaning that a file move will cause us to create a duplicate blob even if the contents did not change; so instead we have to hg cat each file twice, once to feed it to sha1.Hash (in manifest()) and once to copy it into the blobfile (in catFile()).

Merge request reports