1. 08 Dec, 2017 2 commits
    • Jonathan Tan's avatar
      unpack-trees: batch fetching of missing blobs · c0c578b3
      Jonathan Tan authored
      When running checkout, first prefetch all blobs that are to be updated
      but are missing. This means that only one pack is downloaded during such
      operations, instead of one per missing blob.
      
      This operates only on the blob level - if a repository has a missing
      tree, they are still fetched one at a time.
      
      This does not use the delayed checkout mechanism introduced in commit
      2841e8f8 ("convert: add "status=delayed" to filter process protocol",
      2017-06-30) due to significant conceptual differences - in particular,
      for partial clones, we already know what needs to be fetched based on
      the contents of the local repo alone, whereas for status=delayed, it is
      the filter process that tells us what needs to be checked in the end.
      Signed-off-by: default avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: default avatarJeff Hostetler <jeffhost@microsoft.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      c0c578b3
    • Jonathan Tan's avatar
      sha1_file: support lazily fetching missing objects · 8b4c0103
      Jonathan Tan authored
      Teach sha1_file to fetch objects from the remote configured in
      extensions.partialclone whenever an object is requested but missing.
      
      The fetching of objects can be suppressed through a global variable.
      This is used by fsck and index-pack.
      
      However, by default, such fetching is not suppressed. This is meant as a
      temporary measure to ensure that all Git commands work in such a
      situation. Future patches will update some commands to either tolerate
      missing objects (without fetching them) or be more efficient in fetching
      them.
      
      In order to determine the code changes in sha1_file.c necessary, I
      investigated the following:
       (1) functions in sha1_file.c that take in a hash, without the user
           regarding how the object is stored (loose or packed)
       (2) functions in packfile.c (because I need to check callers that know
           about the loose/packed distinction and operate on both differently,
           and ensure that they can handle the concept of objects that are
           neither loose nor packed)
      
      (1) is handled by the modification to sha1_object_info_extended().
      
      For (2), I looked at for_each_packed_object and others.  For
      for_each_packed_object, the callers either already work or are fixed in
      this patch:
       - reachable - only to find recent objects
       - builtin/fsck - already knows about missing objects
       - builtin/cat-file - warning message added in this commit
      
      Callers of the other functions do not need to be changed:
       - parse_pack_index
         - http - indirectly from http_get_info_packs
         - find_pack_entry_one
           - this searches a single pack that is provided as an argument; the
             caller already knows (through other means) that the sought object
             is in a specific pack
       - find_sha1_pack
         - fast-import - appears to be an optimization to not store a file if
           it is already in a pack
         - http-walker - to search through a struct alt_base
         - http-push - to search through remote packs
       - has_sha1_pack
         - builtin/fsck - already knows about promisor objects
         - builtin/count-objects - informational purposes only (check if loose
           object is also packed)
         - builtin/prune-packed - check if object to be pruned is packed (if
           not, don't prune it)
         - revision - used to exclude packed objects if requested by user
         - diff - just for optimization
      Signed-off-by: default avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      8b4c0103
  2. 05 Dec, 2017 1 commit
    • Jonathan Tan's avatar
      introduce fetch-object: fetch one promisor object · 88e2f9ed
      Jonathan Tan authored
      Introduce fetch-object, providing the ability to fetch one object from a
      promisor remote.
      
      This uses fetch-pack. To do this, the transport mechanism has been
      updated with 2 flags, "from-promisor" to indicate that the resulting
      pack comes from a promisor remote (and thus should be annotated as such
      by index-pack), and "no-dependents" to indicate that only the objects
      themselves need to be fetched (but fetching additional objects is
      nevertheless safe).
      
      Whenever "no-dependents" is used, fetch-pack will refrain from using any
      object flags, because it is most likely invoked as part of a dynamic
      object fetch by another Git command (which may itself use object flags).
      An alternative to this is to leave fetch-pack alone, and instead update
      the allocation of flags so that fetch-pack's flags never overlap with
      any others, but this will end up shrinking the number of flags available
      to nearly every other Git command (that is, every Git command that
      accesses objects), so the approach in this commit was used instead.
      
      This will be tested in a subsequent commit.
      Signed-off-by: default avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      88e2f9ed