1. 18 Jul, 2018 1 commit
  2. 21 Jun, 2018 2 commits
    • Jonathan Tan's avatar
      pack-bitmap: add free function · f3c23db2
      Jonathan Tan authored
      Add a function to free struct bitmap_index instances, and use it where
      needed (except when rebuild_existing_bitmaps() is used, since it creates
      references to the bitmaps within the struct bitmap_index passed to it).
      
      Note that the hashes field in struct bitmap_index is not freed because
      it points to another field within the same struct. The documentation for
      that field has been updated to clarify that.
      Signed-off-by: default avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f3c23db2
    • Jonathan Tan's avatar
      pack-bitmap: remove bitmap_git global variable · 3ae5fa07
      Jonathan Tan authored
      Remove the bitmap_git global variable. Instead, generate on demand an
      instance of struct bitmap_index for code that needs to access it.
      
      This allows us significant control over the lifetime of instances of
      struct bitmap_index. In particular, packs can now be closed without
      worrying if an unnecessarily long-lived "pack" field in struct
      bitmap_index still points to it.
      
      The bitmap API is also clearer in that we need to first obtain a struct
      bitmap_index, then we use it.
      
      This patch raises two potential issues: (1) memory for the struct
      bitmap_index is allocated without being freed, and (2)
      prepare_bitmap_git() and prepare_bitmap_walk() can reuse a previously
      loaded bitmap. For (1), this will be dealt with in a subsequent patch in
      this patch set that also deals with freeing the contents of the struct
      bitmap_index (which were not freed previously, because they have global
      scope). For (2), current bitmap users only load the bitmap once at most
      (note that pack-objects can use bitmaps or write bitmaps, but not both
      at the same time), so support for reuse has no effect - and future users
      can pass around the struct bitmap_index * obtained if they need to do 2
      or more things with the same bitmap.
      Helped-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: default avatarJonathan Tan <jonathantanmy@google.com>
      Helped-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      3ae5fa07
  3. 24 May, 2018 1 commit
  4. 17 May, 2018 2 commits
  5. 06 May, 2018 2 commits
    • Johannes Schindelin's avatar
      Replace all die("BUG: ...") calls by BUG() ones · 033abf97
      Johannes Schindelin authored
      In d8193743 (usage.c: add BUG() function, 2017-05-12), a new macro
      was introduced to use for reporting bugs instead of die(). It was then
      subsequently used to convert one single caller in 588a538a
      (setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
      
      The cover letter of the patch series containing this patch
      (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
      terribly clear why only one call site was converted, or what the plan
      is for other, similar calls to die() to report bugs.
      
      Let's just convert all remaining ones in one fell swoop.
      
      This trick was performed by this invocation:
      
      	sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      033abf97
    • Duy Nguyen's avatar
      pack-objects: validation and documentation about unreachable options · 58bd77b6
      Duy Nguyen authored
      These options are added in [1] [2] [3]. All these depend on running
      rev-list internally which is normally true since they are always used
      with "--all --objects" which implies --revs. But let's keep this
      dependency explicit.
      
      While at there, add documentation for them. These are mostly used
      internally by git-repack. But it's still good to not chase down the
      right commit message to know how they work.
      
      [1] ca11b212 (let pack-objects do the writing of unreachable objects
          as loose objects - 2008-05-14)
      [2] 08cdfb13 (pack-objects --keep-unreachable - 2007-09-16)
      [3] e26a8c47 (repack: extend --keep-unreachable to loose objects -
          2016-06-13)
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Reviewed-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      58bd77b6
  6. 02 May, 2018 2 commits
  7. 26 Apr, 2018 2 commits
  8. 16 Apr, 2018 14 commits
    • Duy Nguyen's avatar
      pack-objects: show some progress when counting kept objects · 5af05043
      Duy Nguyen authored
      We only show progress when there are new objects to be packed. But
      when --keep-pack is specified on the base pack, we will exclude most
      of objects. This makes 'pack-objects' stay silent for a long time
      while the counting phase is going.
      
      Let's show some progress whenever we visit an object instead. The old
      "Counting objects" is renamed to "Enumerating objects" and a new
      progress "Counting objects" line is added.
      
      This new "Counting objects" line should progress pretty quick when the
      system is beefy. But when the system is under pressure, the reading
      object header done in this phase could be slow and showing progress is
      an improvement over staying silent in the current code.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      5af05043
    • Duy Nguyen's avatar
      gc --auto: exclude base pack if not enough mem to "repack -ad" · 9806f5a7
      Duy Nguyen authored
      pack-objects could be a big memory hog especially on large repos,
      everybody knows that. The suggestion to stick a .keep file on the
      giant base pack to avoid this problem is also known for a long time.
      
      Recent patches add an option to do just this, but it has to be either
      configured or activated manually. This patch lets `git gc --auto`
      activate this mode automatically when it thinks `repack -ad` will use
      a lot of memory and start affecting the system due to swapping or
      flushing OS cache.
      
      gc --auto decides to do this based on an estimation of pack-objects
      memory usage, which is quite accurate at least for the heap part, and
      whether that fits in half of system memory (the assumption here is for
      desktop environment where there are many other applications running).
      
      This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
      If it is, it is assumed that the user already knows what they want.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      9806f5a7
    • Duy Nguyen's avatar
      repack: add --keep-pack option · ed7e5fc3
      Duy Nguyen authored
      We allow to keep existing packs by having companion .keep files. This
      is helpful when a pack is permanently kept. In the next patch, git-gc
      just wants to keep a pack temporarily, for one pack-objects
      run. git-gc can use --keep-pack for this use case.
      
      A note about why the pack_keep field cannot be reused and
      pack_keep_in_core has to be added. This is about the case when
      --keep-pack is specified together with either --keep-unreachable or
      --unpack-unreachable, but --honor-pack-keep is NOT specified.
      
      In this case, we want to exclude objects from the packs specified on
      command line, not from ones with .keep files. If only one bit flag is
      used, we have to clear pack_keep on pack files with the .keep file.
      
      But we can't make any assumption about unreachable objects in .keep
      packs. If "pack_keep" field is false for .keep packs, we could
      potentially pull lots of unreachable objects into the new pack, or
      unpack them loose. The safer approach is ignore all packs with either
      .keep file or --keep-pack.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ed7e5fc3
    • Duy Nguyen's avatar
      pack-objects: shrink delta_size field in struct object_entry · 0aca34e8
      Duy Nguyen authored
      Allowing a delta size of 64 bits is crazy. Shrink this field down to
      20 bits with one overflow bit.
      
      If we find an existing delta larger than 1MB, we do not cache
      delta_size at all and will get the value from oe_size(), potentially
      from disk if it's larger than 4GB.
      
      Note, since DELTA_SIZE() is used in try_delta() code, it must be
      thread-safe. Luckily oe_size() does guarantee this so we it is
      thread-safe.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      0aca34e8
    • Duy Nguyen's avatar
      pack-objects: shrink size field in struct object_entry · ac77d0c3
      Duy Nguyen authored
      It's very very rare that an uncompressed object is larger than 4GB
      (partly because Git does not handle those large files very well to
      begin with). Let's optimize it for the common case where object size
      is smaller than this limit.
      
      Shrink size field down to 31 bits and one overflow bit. If the size is
      too large, we read it back from disk. As noted in the previous patch,
      we need to return the delta size instead of canonical size when the
      to-be-reused object entry type is a delta instead of a canonical one.
      
      Add two compare helpers that can take advantage of the overflow
      bit (e.g. if the file is 4GB+, chances are it's already larger than
      core.bigFileThreshold and there's no point in comparing the actual
      value).
      
      Another note about oe_get_size_slow(). This function MUST be thread
      safe because SIZE() macro is used inside try_delta() which may run in
      parallel. Outside parallel code, no-contention locking should be dirt
      cheap (or insignificant compared to i/o access anyway). To exercise
      this code, it's best to run the test suite with something like
      
          make test GIT_TEST_OE_SIZE=4
      
      which forces this code on all objects larger than 3 bytes.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ac77d0c3
    • Duy Nguyen's avatar
      pack-objects: clarify the use of object_entry::size · 27a7d067
      Duy Nguyen authored
      While this field most of the time contains the canonical object size,
      there is one case it does not: when we have found that the base object
      of the delta in question is also to be packed, we will very happily
      reuse the delta by copying it over instead of regenerating the new
      delta.
      
      "size" in this case will record the delta size, not canonical object
      size. Later on in write_reuse_object(), we reconstruct the delta
      header and "size" is used for this purpose. When this happens, the
      "type" field contains a delta type instead of a canonical type.
      Highlight this in the code since it could be tricky to see.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      27a7d067
    • Duy Nguyen's avatar
      pack-objects: don't check size when the object is bad · 660b3735
      Duy Nguyen authored
      sha1_object_info() in check_objects() may fail to locate an object in
      the pack and return type OBJ_BAD. In that case, it will likely leave
      the "size" field untouched. We delay error handling until later in
      prepare_pack() though. Until then, do not touch "size" field.
      
      This field should contain the default value zero, but we can't say
      sha1_object_info() cannot damage it. This becomes more important later
      when the object size may have to be retrieved back from the
      (non-existing) pack.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      660b3735
    • Duy Nguyen's avatar
      pack-objects: shrink z_delta_size field in struct object_entry · 0cb3c142
      Duy Nguyen authored
      We only cache deltas when it's smaller than a certain limit. This limit
      defaults to 1000 but save its compressed length in a 64-bit field.
      Shrink that field down to 20 bits, so you can only cache 1MB deltas.
      Larger deltas must be recomputed at when the pack is written down.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      0cb3c142
    • Duy Nguyen's avatar
      pack-objects: refer to delta objects by index instead of pointer · 898eba5e
      Duy Nguyen authored
      These delta pointers always point to elements in the objects[] array
      in packing_data struct. We can only hold maximum 4G of those objects
      because the array size in nr_objects is uint32_t. We could use
      uint32_t indexes to address these elements instead of pointers. On
      64-bit architecture (8 bytes per pointer) this would save 4 bytes per
      pointer.
      
      Convert these delta pointers to indexes. Since we need to handle NULL
      pointers as well, the index is shifted by one [1].
      
      [1] This means we can only index 2^32-2 objects even though nr_objects
          could contain 2^32-1 objects. It should not be a problem in
          practice because when we grow objects[], nr_alloc would probably
          blow up long before nr_objects hits the wall.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      898eba5e
    • Duy Nguyen's avatar
      pack-objects: move in_pack out of struct object_entry · 43fa44fa
      Duy Nguyen authored
      Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
      pack. Use an index instead since the number of packs should be
      relatively small.
      
      This limits the number of packs we can handle to 1k. Since we can't be
      sure people can never run into the situation where they have more than
      1k pack files. Provide a fall back route for it.
      
      If we find out they have too many packs, the new in_pack_by_idx[]
      array (which has at most 1k elements) will not be used. Instead we
      allocate in_pack[] array that holds nr_objects elements. This is
      similar to how the optional in_pack_pos field is handled.
      
      The new simple test is just to make sure the too-many-packs code path
      is at least executed. The true test is running
      
          make test GIT_TEST_FULL_IN_PACK_ARRAY=1
      
      to take advantage of other special case tests.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      43fa44fa
    • Duy Nguyen's avatar
      pack-objects: move in_pack_pos out of struct object_entry · 06af3bba
      Duy Nguyen authored
      This field is only need for pack-bitmap, which is an optional
      feature. Move it to a separate array that is only allocated when
      pack-bitmap is used (like objects[], it is not freed, since we need it
      until the end of the process)
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      06af3bba
    • Duy Nguyen's avatar
      pack-objects: use bitfield for object_entry::depth · b5c0cbd8
      Duy Nguyen authored
      Because of struct packing from now on we can only handle max depth
      4095 (or even lower when new booleans are added in this struct). This
      should be ok since long delta chain will cause significant slow down
      anyway.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      b5c0cbd8
    • Duy Nguyen's avatar
    • Duy Nguyen's avatar
      pack-objects: turn type and in_pack_type to bitfields · fd9b1bae
      Duy Nguyen authored
      An extra field type_valid is added to carry the equivalent of OBJ_BAD
      in the original "type" field. in_pack_type always contains a valid
      type so we only need 3 bits for it.
      
      A note about accepting OBJ_NONE as "valid" type. The function
      read_object_list_from_stdin() can pass this value [1] and it
      eventually calls create_object_entry() where current code skip setting
      "type" field if the incoming type is zero. This does not have any bad
      side effects because "type" field should be memset()'d anyway.
      
      But since we also need to set type_valid now, skipping oe_set_type()
      leaves type_valid zero/false, which will make oe_type() return
      OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
      prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger
      
          fatal: unable to get type of object ...
      
      Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
      been for a very long time and I haven't time to dig in further.
      
      [1] See 5c49c116 (pack-objects: better check_object() performances -
          2007-04-16)
      
      [2] 21666f1a (convert object type handling from a string to a number
          - 2007-02-26)
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      fd9b1bae
  9. 02 Apr, 2018 2 commits
  10. 26 Mar, 2018 3 commits
  11. 14 Mar, 2018 4 commits
  12. 06 Mar, 2018 1 commit
  13. 14 Feb, 2018 1 commit
  14. 02 Feb, 2018 1 commit
  15. 30 Jan, 2018 1 commit
  16. 24 Jan, 2018 1 commit