Skip to content
  • Jeff King's avatar
    fsck: detect gitmodules files · 159e7b08
    Jeff King authored
    
    
    In preparation for performing fsck checks on .gitmodules
    files, this commit plumbs in the actual detection of the
    files. Note that unlike most other fsck checks, this cannot
    be a property of a single object: we must know that the
    object is found at a ".gitmodules" path at the root tree of
    a commit.
    
    Since the fsck code only sees one object at a time, we have
    to mark the related objects to fit the puzzle together. When
    we see a commit we mark its tree as a root tree, and when
    we see a root tree with a .gitmodules file, we mark the
    corresponding blob to be checked.
    
    In an ideal world, we'd check the objects in topological
    order: commits followed by trees followed by blobs. In that
    case we can avoid ever loading an object twice, since all
    markings would be complete by the time we get to the marked
    objects. And indeed, if we are checking a single packfile,
    this is the order in which Git will generally write the
    objects. But we can't count on that:
    
      1. git-fsck may show us the objects in arbitrary order
         (loose objects are fed in sha1 order, but we may also
         have multiple packs, and we process each pack fully in
         sequence).
    
      2. The type ordering is just what git-pack-objects happens
         to write now. The pack format does not require a
         specific order, and it's possible that future versions
         of Git (or a custom version trying to fool official
         Git's fsck checks!) may order it differently.
    
      3. We may not even be fscking all of the relevant objects
         at once. Consider pushing with transfer.fsckObjects,
         where one push adds a blob at path "foo", and then a
         second push adds the same blob at path ".gitmodules".
         The blob is not part of the second push at all, but we
         need to mark and check it.
    
    So in the general case, we need to make up to three passes
    over the objects: once to make sure we've seen all commits,
    then once to cover any trees we might have missed, and then
    a final pass to cover any .gitmodules blobs we found in the
    second pass.
    
    We can simplify things a bit by loosening the requirement
    that we find .gitmodules only at root trees. Technically
    a file like "subdir/.gitmodules" is not parsed by Git, but
    it's not unreasonable for us to declare that Git is aware of
    all ".gitmodules" files and make them eligible for checking.
    That lets us drop the root-tree requirement, which
    eliminates one pass entirely. And it makes our worst case
    much better: instead of potentially queueing every root tree
    to be re-examined, the worst case is that we queue each
    unique .gitmodules blob for a second look.
    
    This patch just adds the boilerplate to find .gitmodules
    files. The actual content checks will come in a subsequent
    commit.
    
    Signed-off-by: default avatarJeff King <peff@peff.net>
    159e7b08