1. 16 Apr, 2019 2 commits
    • Jeff King's avatar
      packfile: fix pack basename computation · fc789156
      Jeff King authored
      When we have a multi-pack-index that covers many packfiles, we try to
      avoid opening the .idx for those packfiles. To do that we feed the pack
      name to midx_contains_pack(). But that function wants to see only the
      basename, which we compute using strrchr() to find the final slash. But
      that leaves an extra "/" at the start of our string.
      We can fix this by incrementing the pointer. That also raises the
      question of what to do when the name does not have a '/' at all. This
      should generally not happen (we always find files in "pack/"), but it
      doesn't hurt to be defensive here.
      Let's wrap all of that up in a helper function and make it publicly
      available, since a later patch will need to use it, too.
      The tests don't notice because there's nothing about opening those .idx
      files that would cause us to give incorrect output. It's just a little
      slower. The new test checks this case by corrupting the covered .idx,
      and then making sure we don't complain about it.
      We also have to tweak t5570, which intentionally corrupts a .idx file
      and expects us to notice it. When run with GIT_TEST_MULTI_PACK_INDEX,
      this will fail since we now will (correctly) not bother opening the .idx
      at all. We can fix that by unconditionally dropping any midx that's
      there, which ensures we'll have to read the .idx.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      pack-revindex: open index if necessary · 4828ce98
      Jeff King authored
      We can't create a pack revindex if we haven't actually looked at the
      index. Normally we would never get as far as creating a revindex without
      having already been looking in the pack, so this code never bothered to
      double-check that pack->index_data had been loaded.
      But with the new multi-pack-index feature, many code paths might not
      load the individual pack .idx at all (they'd find objects via the midx
      and then open the .pack, but not its index).
      This can't yet be triggered in practice, because a bug in the midx code
      means we accidentally open up the individual .idx files anyway. But in
      preparation for fixing that, let's have the revindex code check that
      everything it needs has been loaded.
      In most cases this will just be a quick noop. But note that this does
      introduce a possibility of error (if we have to open the index and it's
      corrupt), so load_pack_revindex() now returns a result code, and callers
      need to handle the error.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  2. 01 Apr, 2019 1 commit
  3. 22 Mar, 2019 1 commit
  4. 15 Jan, 2019 1 commit
    • brian m. carlson's avatar
      tree-walk: store object_id in a separate member · ea82b2a0
      brian m. carlson authored
      When parsing a tree, we read the object ID directly out of the tree
      buffer. This is normally fine, but such an object ID cannot be used with
      oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
      there may not be that many bytes to copy.
      Instead, store the object ID in a separate struct member. Since we can
      no longer efficiently compute the path length, store that information as
      well in struct name_entry. Ensure we only copy the object ID into the
      new buffer if the path length is nonzero, as some callers will pass us
      an empty path with no object ID following it, and we will not want to
      read past the end of the buffer.
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  5. 08 Jan, 2019 1 commit
  6. 13 Nov, 2018 3 commits
    • Jeff King's avatar
      object-store: provide helpers for loose_objects_cache · 3a2e0824
      Jeff King authored
      Our object_directory struct has a loose objects cache that all users of
      the struct can see. But the only one that knows how to load the cache is
      find_short_object_filename(). Let's extract that logic in to a reusable
      While we're at it, let's also reset the cache when we re-read the object
      directories. This shouldn't have an impact on performance, as re-reads
      are meant to be rare (and are already expensive, so we avoid them with
      things like OBJECT_INFO_QUICK).
      Since the cache is already meant to be an approximation, it's tempting
      to skip even this bit of safety. But it's necessary to allow more code
      to use it. For instance, fetch-pack explicitly re-reads the object
      directory after performing its fetch, and would be confused if we didn't
      clear the cache.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      sha1-file: use an object_directory for the main object dir · f0eaf638
      Jeff King authored
      Our handling of alternate object directories is needlessly different
      from the main object directory. As a result, many places in the code
      basically look like this:
        for (odb = r->objects->alt_odb_list; odb; odb = odb->next)
      That gets annoying when do_something() is non-trivial, and we've
      resorted to gross hacks like creating fake alternates (see
      Instead, let's give each raw_object_store a unified list of
      object_directory structs. The first will be the main store, and
      everything after is an alternate. Very few callers even care about the
      distinction, and can just loop over the whole list (and those who care
      can just treat the first element differently).
      A few observations:
        - we don't need r->objects->objectdir anymore, and can just
          mechanically convert that to r->objects->odb->path
        - object_directory's path field needs to become a real pointer rather
          than a FLEX_ARRAY, in order to fill it with expand_base_dir()
        - we'll call prepare_alt_odb() earlier in many functions (i.e.,
          outside of the loop). This may result in us calling it even when our
          function would be satisfied looking only at the main odb.
          But this doesn't matter in practice. It's not a very expensive
          operation in the first place, and in the majority of cases it will
          be a noop. We call it already (and cache its results) in
          prepare_packed_git(), and we'll generally check packs before loose
          objects. So essentially every program is going to call it
          immediately once per program.
          Arguably we should just prepare_alt_odb() immediately upon setting
          up the repository's object directory, which would save us sprinkling
          calls throughout the code base (and forgetting to do so has been a
          source of subtle bugs in the past). But I've stopped short of that
          here, since there are already a lot of other moving parts in this
        - Most call sites just get shorter. The check_and_freshen() functions
          are an exception, because they have entry points to handle local and
          nonlocal directories separately.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      rename "alternate_object_database" to "object_directory" · 263db403
      Jeff King authored
      In preparation for unifying the handling of alt odb's and the normal
      repo object directory, let's use a more neutral name. This patch is
      purely mechanical, swapping the type name, and converting any variables
      named "alt" to "odb". There should be no functional change, but it will
      reduce the noise in subsequent diffs.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  7. 26 Oct, 2018 1 commit
    • Derrick Stolee's avatar
      packfile: close multi-pack-index in close_all_packs · dc7d6643
      Derrick Stolee authored
      Whenever we delete pack-files from the object directory, we need
      to also delete the multi-pack-index that may refer to those
      objects. Sometimes, this connection is obvious, like during a
      repack. Other times, this is less obvious, like when gc calls
      a repack command and then does other actions on the objects, like
      write a commit-graph file.
      The pattern we use to avoid out-of-date in-memory packed_git
      structs is to call close_all_packs(). This should also call
      close_midx(). Since we already pass an object store to
      close_all_packs(), this is a nicely scoped operation.
      This fixes a test failure when running t6500-gc.sh with
      Reported-by: Gábor Szeder's avatarSzeder Gábor <szeder.dev@gmail.com>
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  8. 19 Oct, 2018 1 commit
  9. 15 Oct, 2018 2 commits
  10. 29 Aug, 2018 2 commits
    • Jeff King's avatar
      convert "hashcmp() != 0" to "!hasheq()" · 67947c34
      Jeff King authored
      This rounds out the previous three patches, covering the
      inequality logic for the "hash" variant of the functions.
      As with the previous three, the accompanying code changes
      are the mechanical result of applying the coccinelle patch;
      see those patches for more discussion.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      convert "hashcmp() == 0" to hasheq() · e3ff0683
      Jeff King authored
      This is the partner patch to the previous one, but covering
      the "hash" variants instead of "oid".  Note that our
      coccinelle rule is slightly more complex to avoid triggering
      the call in hasheq().
      I didn't bother to add a new rule to convert:
        - hasheq(E1->hash, E2->hash)
        + oideq(E1, E2)
      Since these are new functions, there won't be any such
      existing callers. And since most of the code is already
      using oideq, we're not likely to introduce new ones.
      We might still see "!hashcmp(E1->hash, E2->hash)" from topics
      in flight. But because our new rule comes after the existing
      ones, that should first get converted to "!oidcmp(E1, E2)"
      and then to "oideq(E1, E2)".
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  11. 20 Aug, 2018 4 commits
  12. 13 Aug, 2018 2 commits
    • Jeff King's avatar
      for_each_packed_object: support iterating in pack-order · 736eb88f
      Jeff King authored
      We currently iterate over objects within a pack in .idx
      order, which uses the object hashes. That means that it
      is effectively random with respect to the location of the
      object within the pack. If you're going to access the actual
      object data, there are two reasons to move linearly through
      the pack itself:
        1. It improves the locality of access in the packfile. In
           the cold-cache case, this may mean fewer disk seeks, or
           better usage of disk cache.
        2. We store related deltas together in the packfile. Which
           means that the delta base cache can operate much more
           efficiently if we visit all of those related deltas in
           sequence, as the earlier items are likely to still be
           in the cache.  Whereas if we visit the objects in
           random order, our cache entries are much more likely to
           have been evicted by unrelated deltas in the meantime.
      So in general, if you're going to access the object contents
      pack order is generally going to end up more efficient.
      But if you're simply generating a list of object names, or
      if you're going to end up sorting the result anyway, you're
      better off just using the .idx order, as finding the pack
      order means generating the in-memory pack-revindex.
      According to the numbers in 8b8dfd51 (pack-revindex:
      radix-sort the revindex, 2013-07-11), that takes about 200ms
      for linux.git, and 20ms for git.git (those numbers are a few
      years old but are still a good ballpark).
      That makes it a good optimization for some cases (we can
      save tens of seconds in git.git by having good locality of
      delta access, for a 20ms cost), but a bad one for others
      (e.g., right now "cat-file --batch-all-objects
      --batch-check="%(objectname)" is 170ms in git.git, so adding
      20ms to that is noticeable).
      Hence this patch makes it an optional flag. You can't
      actually do any interesting timings yet, as it's not plumbed
      through to any user-facing tools like cat-file. That will
      come in a later patch.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      for_each_*_object: take flag arguments as enum · a7ff6f5a
      Jeff King authored
      It's not wrong to pass our flags in an "unsigned", as we
      know it will be at least as large as the enum.  However,
      using the enum in the declaration makes it more obvious
      where to find the list of flags.
      While we're here, let's also drop the "extern" noise-words
      from the declarations, per our modern coding style.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  13. 20 Jul, 2018 8 commits
  14. 29 Jun, 2018 1 commit
  15. 13 Jun, 2018 1 commit
    • Jeremy Linton's avatar
      packfile: correct zlib buffer handling · b611396e
      Jeremy Linton authored
      The buffer being passed to zlib includes a NUL terminator that git
      needs to keep in place. unpack_compressed_entry() attempts to detect
      the case that the source buffer hasn't been fully consumed by
      checking to see if the destination buffer has been over consumed.
      This causes a problem, that more recent zlib patches have been
      poisoning the unconsumed portions of the buffer which overwrites
      the NUL byte, while correctly returning length and status.
      Let's place the NUL at the end of the buffer after inflate returns
      to assure that it doesn't result in problems for git even if its
      been overwritten by zlib.
      Signed-off-by: 's avatarJeremy Linton <lintonrjeremy@gmail.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  16. 06 May, 2018 1 commit
    • Johannes Schindelin's avatar
      Replace all die("BUG: ...") calls by BUG() ones · 033abf97
      Johannes Schindelin authored
      In d8193743 (usage.c: add BUG() function, 2017-05-12), a new macro
      was introduced to use for reporting bugs instead of die(). It was then
      subsequently used to convert one single caller in 588a538a
      (setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
      The cover letter of the patch series containing this patch
      (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
      terribly clear why only one call site was converted, or what the plan
      is for other, similar calls to die() to report bugs.
      Let's just convert all remaining ones in one fell swoop.
      This trick was performed by this invocation:
      	sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
  17. 02 May, 2018 4 commits
  18. 26 Apr, 2018 4 commits