1. 16 Apr, 2019 2 commits
    • Jeff King's avatar
      packfile: fix pack basename computation · fc789156
      Jeff King authored
      When we have a multi-pack-index that covers many packfiles, we try to
      avoid opening the .idx for those packfiles. To do that we feed the pack
      name to midx_contains_pack(). But that function wants to see only the
      basename, which we compute using strrchr() to find the final slash. But
      that leaves an extra "/" at the start of our string.
      We can fix this by incrementing the pointer. That also raises the
      question of what to do when the name does not have a '/' at all. This
      should generally not happen (we always find files in "pack/"), but it
      doesn't hurt to be defensive here.
      Let's wrap all of that up in a helper function and make it publicly
      available, since a later patch will need to use it, too.
      The tests don't notice because there's nothing about opening those .idx
      files that would cause us to give incorrect output. It's just a little
      slower. The new test checks this case by corrupting the covered .idx,
      and then making sure we don't complain about it.
      We also have to tweak t5570, which intentionally corrupts a .idx file
      and expects us to notice it. When run with GIT_TEST_MULTI_PACK_INDEX,
      this will fail since we now will (correctly) not bother opening the .idx
      at all. We can fix that by unconditionally dropping any midx that's
      there, which ensures we'll have to read the .idx.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      packfile.h: drop extern from function declarations · 336226c2
      Jeff King authored
      As CodingGuidelines recommends, we do not need an "extern" when
      declaring a public function. Let's drop these. Note that we leave the
      extern on report_garbage(), as that is actually a function pointer, not
      a function itself.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  2. 22 Mar, 2019 1 commit
  3. 19 Oct, 2018 1 commit
  4. 15 Oct, 2018 1 commit
  5. 20 Aug, 2018 1 commit
    • Derrick Stolee's avatar
      packfile: add all_packs list · 0bff5269
      Derrick Stolee authored
      If a repo contains a multi-pack-index, then the packed_git list
      does not contain the packfiles that are covered by the multi-pack-index.
      This is important for doing object lookups, abbreviations, and
      approximating object count. However, there are many operations that
      really want to iterate over all packfiles.
      Create a new 'all_packs' linked list that contains this list, starting
      with the packfiles in the multi-pack-index and then continuing along
      the packed_git linked list.
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  6. 15 Aug, 2018 1 commit
  7. 14 Aug, 2018 1 commit
    • Jeff King's avatar
      for_each_*_object: move declarations to object-store.h · 0889aae1
      Jeff King authored
      The for_each_loose_object() and for_each_packed_object()
      functions are meant to be part of a unified interface: they
      use the same set of for_each_object_flags, and it's not
      inconceivable that we might one day add a single
      for_each_object() wrapper around them.
      Let's put them together in a single file, so we can avoid
      awkwardness like saying "the flags for this function are
      over in cache.h". Moving the loose functions to packfile.h
      is silly. Moving the packed functions to cache.h works, but
      makes the "cache.h is a kitchen sink" problem worse. The
      best place is the recently-created object-store.h, since
      these are quite obviously related to object storage.
      The for_each_*_in_objdir() functions do not use the same
      flags, but they are logically part of the same interface as
      for_each_loose_object(), and share callback signatures. So
      we'll move those, as well, as they also make sense in
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  8. 13 Aug, 2018 4 commits
    • Jeff King's avatar
      for_each_packed_object: support iterating in pack-order · 736eb88f
      Jeff King authored
      We currently iterate over objects within a pack in .idx
      order, which uses the object hashes. That means that it
      is effectively random with respect to the location of the
      object within the pack. If you're going to access the actual
      object data, there are two reasons to move linearly through
      the pack itself:
        1. It improves the locality of access in the packfile. In
           the cold-cache case, this may mean fewer disk seeks, or
           better usage of disk cache.
        2. We store related deltas together in the packfile. Which
           means that the delta base cache can operate much more
           efficiently if we visit all of those related deltas in
           sequence, as the earlier items are likely to still be
           in the cache.  Whereas if we visit the objects in
           random order, our cache entries are much more likely to
           have been evicted by unrelated deltas in the meantime.
      So in general, if you're going to access the object contents
      pack order is generally going to end up more efficient.
      But if you're simply generating a list of object names, or
      if you're going to end up sorting the result anyway, you're
      better off just using the .idx order, as finding the pack
      order means generating the in-memory pack-revindex.
      According to the numbers in 8b8dfd51 (pack-revindex:
      radix-sort the revindex, 2013-07-11), that takes about 200ms
      for linux.git, and 20ms for git.git (those numbers are a few
      years old but are still a good ballpark).
      That makes it a good optimization for some cases (we can
      save tens of seconds in git.git by having good locality of
      delta access, for a 20ms cost), but a bad one for others
      (e.g., right now "cat-file --batch-all-objects
      --batch-check="%(objectname)" is 170ms in git.git, so adding
      20ms to that is noticeable).
      Hence this patch makes it an optional flag. You can't
      actually do any interesting timings yet, as it's not plumbed
      through to any user-facing tools like cat-file. That will
      come in a later patch.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      for_each_*_object: give more comprehensive docstrings · 8b361551
      Jeff King authored
      We already mention the local/alternate behavior of these
      functions, but we can help clarify a few other behaviors:
       - there's no need to mention LOCAL_ONLY specifically, since
         we already reference the flags by type (and as we add
         more flags, we don't want to have to mention each)
       - clarify that reachability doesn't matter here; this is
         all accessible objects
       - what ordering/uniqueness guarantees we give
       - how pack-specific flags are handled for the loose case
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      for_each_*_object: take flag arguments as enum · a7ff6f5a
      Jeff King authored
      It's not wrong to pass our flags in an "unsigned", as we
      know it will be at least as large as the enum.  However,
      using the enum in the declaration makes it more obvious
      where to find the list of flags.
      While we're here, let's also drop the "extern" noise-words
      from the declarations, per our modern coding style.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      for_each_*_object: store flag definitions in a single location · 202e7f1e
      Jeff King authored
      These flags were split between cache.h and packfile.h,
      because some of the flags apply only to packs. However, they
      share a single numeric namespace, since both are respected
      for the packed variant. Let's make sure they're defined
      together so that nobody accidentally adds a new flag in one
      location that duplicates the other.
      While we're here, let's also put them in an enum (which
      helps debugger visibility) and use "(1<<n)" rather than
      counting powers of 2 manually.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  9. 26 Jul, 2018 1 commit
  10. 20 Jul, 2018 3 commits
  11. 16 May, 2018 1 commit
    • Stefan Beller's avatar
      object-store: move object access functions to object-store.h · cbd53a21
      Stefan Beller authored
      This should make these functions easier to find and cache.h less
      overwhelming to read.
      In particular, this moves:
      - read_object_file
      - oid_object_info
      - write_object_file
      As a result, most of the codebase needs to #include object-store.h.
      In this patch the #include is only added to files that would fail to
      compile otherwise.  It would be better to #include wherever
      identifiers from the header are used.  That can happen later
      when we have better tooling for it.
      Signed-off-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  12. 02 May, 2018 2 commits
  13. 26 Apr, 2018 3 commits
  14. 11 Apr, 2018 1 commit
  15. 26 Mar, 2018 10 commits
  16. 22 Mar, 2018 1 commit
  17. 05 Dec, 2017 1 commit
    • Jonathan Tan's avatar
      fsck: introduce partialclone extension · 498f1f61
      Jonathan Tan authored
      Currently, Git does not support repos with very large numbers of objects
      or repos that wish to minimize manipulation of certain blobs (for
      example, because they are very large) very well, even if the user
      operates mostly on part of the repo, because Git is designed on the
      assumption that every referenced object is available somewhere in the
      repo storage. In such an arrangement, the full set of objects is usually
      available in remote storage, ready to be lazily downloaded.
      Teach fsck about the new state of affairs. In this commit, teach fsck
      that missing promisor objects referenced from the reflog are not an
      error case; in future commits, fsck will be taught about other cases.
      Signed-off-by: 's avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  18. 23 Aug, 2017 5 commits