1. 20 Mar, 2019 1 commit
    • Jeff King's avatar
      fetch_pack(): drop unused parameters · 0f804b0b
      Jeff King authored
      We don't need the caller of fetch_pack() to pass in "dest", which is the
      remote URL. Since ba227857 (Reduce the number of connects when
      fetching, 2008-02-04), the caller is responsible for calling
      git_connect() itself, and our "dest" parameter is unused.
      
      That commit also started passing us the resulting "conn" child_process
      from git_connect(). But likewise, we do not need do anything with it.
      The descriptors in "fd" are enough for us, and the caller is responsible
      for cleaning up "conn".
      
      We can just drop both parameters.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      0f804b0b
  2. 04 Oct, 2018 1 commit
    • Jonathan Tan's avatar
      fetch-pack: exclude blobs when lazy-fetching trees · 4c7f9567
      Jonathan Tan authored
      A partial clone with missing trees can be obtained using "git clone
      --filter=tree:none <repo>". In such a repository, when a tree needs to
      be lazily fetched, any tree or blob it directly or indirectly references
      is fetched as well, regardless of whether the original command required
      those objects, or if the local repository already had some of them.
      
      This is because the fetch protocol, which the lazy fetch uses, does not
      allow clients to request that only the wanted objects be sent, which
      would be the ideal solution. This patch implements a partial solution:
      specify the "blob:none" filter, somewhat reducing the fetch payload.
      
      This change has no effect when lazily fetching blobs (due to how filters
      work). And if lazily fetching a commit (such repositories are difficult
      to construct and is not a use case we support very well, but it is
      possible), referenced commits and trees are still fetched - only the
      blobs are not fetched.
      
      The necessary code change is done in fetch_pack() instead of somewhere
      closer to where the "filter" instruction is written to the wire so that
      only one part of the code needs to be changed in order for users of all
      protocol versions to benefit from this optimization.
      Signed-off-by: 's avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      4c7f9567
  3. 03 Jul, 2018 2 commits
    • Jonathan Tan's avatar
      fetch-pack: support negotiation tip whitelist · 3390e42a
      Jonathan Tan authored
      During negotiation, fetch-pack eventually reports as "have" lines all
      commits reachable from all refs. Allow the user to restrict the commits
      sent in this way by providing a whitelist of tips; only the tips
      themselves and their ancestors will be sent.
      
      Both globs and single objects are supported.
      
      This feature is only supported for protocols that support connect or
      stateless-connect (such as HTTP with protocol v2).
      
      This will speed up negotiation when the repository has multiple
      relatively independent branches (for example, when a repository
      interacts with multiple repositories, such as with linux-next [1] and
      torvalds/linux [2]), and the user knows which local branch is likely to
      have commits in common with the upstream branch they are fetching.
      
      [1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/
      [2] https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/Signed-off-by: 's avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      3390e42a
    • Jonathan Tan's avatar
      fetch-pack: write shallow, then check connectivity · cf1e7c07
      Jonathan Tan authored
      When fetching, connectivity is checked after the shallow file is
      updated. There are 2 issues with this: (1) the connectivity check is
      only performed up to ancestors of existing refs (which is not thorough
      enough if we were deepening an existing ref in the first place), and (2)
      there is no rollback of the shallow file if the connectivity check
      fails.
      
      To solve (1), update the connectivity check to check the ancestry chain
      completely in the case of a deepening fetch by refraining from passing
      "--not --all" when invoking rev-list in connected.c.
      
      To solve (2), have fetch_pack() perform its own connectivity check
      before updating the shallow file. To support existing use cases in which
      "git fetch-pack" is used to download objects without much regard as to
      the connectivity of the resulting objects with respect to the existing
      repository, the connectivity check is only done if necessary (that is,
      the fetch is not a clone, and the fetch involves shallow/deepen
      functionality). "git fetch" still performs its own connectivity check,
      preserving correctness but sometimes performing redundant work. This
      redundancy is mitigated by the fact that fetch_pack() reports if it has
      performed a connectivity check itself, and if the transport supports
      connect or stateless-connect, it will bubble up that report so that "git
      fetch" knows not to perform the connectivity check in such a case.
      
      This was noticed when a user tried to deepen an existing repository by
      fetching with --no-shallow from a server that did not send all necessary
      objects - the connectivity check as run by "git fetch" succeeded, but a
      subsequent "git fsck" failed.
      Signed-off-by: 's avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      cf1e7c07
  4. 24 Apr, 2018 1 commit
  5. 15 Mar, 2018 1 commit
  6. 08 Dec, 2017 1 commit
  7. 05 Dec, 2017 1 commit
    • Jonathan Tan's avatar
      introduce fetch-object: fetch one promisor object · 88e2f9ed
      Jonathan Tan authored
      Introduce fetch-object, providing the ability to fetch one object from a
      promisor remote.
      
      This uses fetch-pack. To do this, the transport mechanism has been
      updated with 2 flags, "from-promisor" to indicate that the resulting
      pack comes from a promisor remote (and thus should be annotated as such
      by index-pack), and "no-dependents" to indicate that only the objects
      themselves need to be fetched (but fetching additional objects is
      nevertheless safe).
      
      Whenever "no-dependents" is used, fetch-pack will refrain from using any
      object flags, because it is most likely invoked as part of a dynamic
      object fetch by another Git command (which may itself use object flags).
      An alternative to this is to leave fetch-pack alone, and instead update
      the allocation of flags so that fetch-pack's flags never overlap with
      any others, but this will end up shrinking the number of flags available
      to nearly every other Git command (that is, every Git command that
      accesses objects), so the approach in this commit was used instead.
      
      This will be tested in a subsequent commit.
      Signed-off-by: 's avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      88e2f9ed
  8. 31 Mar, 2017 1 commit
    • brian m. carlson's avatar
      Rename sha1_array to oid_array · 910650d2
      brian m. carlson authored
      Since this structure handles an array of object IDs, rename it to struct
      oid_array.  Also rename the accessor functions and the initialization
      constant.
      
      This commit was produced mechanically by providing non-Documentation
      files to the following Perl one-liners:
      
          perl -pi -E 's/struct sha1_array/struct oid_array/g'
          perl -pi -E 's/\bsha1_array_/oid_array_/g'
          perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      910650d2
  9. 02 Mar, 2017 1 commit
  10. 13 Jun, 2016 4 commits
  11. 11 Dec, 2013 3 commits
    • Duy Nguyen's avatar
      fetch: add --update-shallow to accept refs that update .git/shallow · 48d25cae
      Duy Nguyen authored
      The same steps are done as in when --update-shallow is not given. The
      only difference is we now add all shallow commits in "ours" and
      "theirs" to .git/shallow (aka "step 8").
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      48d25cae
    • Duy Nguyen's avatar
      clone: support remote shallow repository · beea4152
      Duy Nguyen authored
      Cloning from a shallow repository does not follow the "8 steps for new
      .git/shallow" because if it does we need to get through step 6 for all
      refs. That means commit walking down to the bottom.
      
      Instead the rule to create .git/shallow is simpler and, more
      importantly, cheap: if a shallow commit is found in the pack, it's
      probably used (i.e. reachable from some refs), so we add it. Others
      are dropped.
      
      One may notice this method seems flawed by the word "probably". A
      shallow commit may not be reachable from any refs at all if it's
      attached to an object island (a group of objects that are not
      reachable by any refs).
      
      If that object island is not complete, a new fetch request may send
      more objects to connect it to some ref. At that time, because we
      incorrectly installed the shallow commit in this island, the user will
      not see anything after that commit (fsck is still ok). This is not
      desired.
      
      Given that object islands are rare (C Git never sends such islands for
      security reasons) and do not really harm the repository integrity, a
      tradeoff is made to surprise the user occasionally but work faster
      everyday.
      
      A new option --strict could be added later that follows exactly the 8
      steps. "git prune" can also learn to remove dangling objects _and_ the
      shallow commits that are attached to them from .git/shallow.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      beea4152
    • Duy Nguyen's avatar
  12. 09 Dec, 2013 1 commit
    • Torsten Bögershausen's avatar
      git fetch-pack: add --diag-url · 5610b7c0
      Torsten Bögershausen authored
      The main purpose is to trace the URL parser called by git_connect() in
      connect.c
      
      The main features of the parser can be listed as this:
      
      - parse out host and path for URLs with a scheme (git:// file:// ssh://)
      - parse host names embedded by [] correctly
      - extract the port number, if present
      - separate URLs like "file" (which are local)
        from URLs like "host:repo" which should use ssh
      
      Add the new parameter "--diag-url" to "git fetch-pack", which prints
      the value for protocol, host and path to stderr and exits.
      Signed-off-by: 's avatarTorsten Bögershausen <tboegi@web.de>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      5610b7c0
  13. 08 Jul, 2013 1 commit
    • Junio C Hamano's avatar
      cache.h: move remote/connect API out of it · 47a59185
      Junio C Hamano authored
      The definition of "struct ref" in "cache.h", a header file so
      central to the system, always confused me.  This structure is not
      about the local ref used by sha1-name API to name local objects.
      
      It is what refspecs are expanded into, after finding out what refs
      the other side has, to define what refs are updated after object
      transfer succeeds to what values.  It belongs to "remote.h" together
      with "struct refspec".
      
      While we are at it, also move the types and functions related to the
      Git transport connection to a new header file connect.h
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      47a59185
  14. 28 May, 2013 1 commit
    • Duy Nguyen's avatar
      clone: open a shortcut for connectivity check · c6807a40
      Duy Nguyen authored
      In order to make sure the cloned repository is good, we run "rev-list
      --objects --not --all $new_refs" on the repository. This is expensive
      on large repositories. This patch attempts to mitigate the impact in
      this special case.
      
      In the "good" clone case, we only have one pack. If all of the
      following are met, we can be sure that all objects reachable from the
      new refs exist, which is the intention of running "rev-list ...":
      
       - all refs point to an object in the pack
       - there are no dangling pointers in any object in the pack
       - no objects in the pack point to objects outside the pack
      
      The second and third checks can be done with the help of index-pack as
      a slight variation of --strict check (which introduces a new condition
      for the shortcut: pack transfer must be used and the number of objects
      large enough to call index-pack). The first is checked in
      check_everything_connected after we get an "ok" from index-pack.
      
      "index-pack + new checks" is still faster than the current "index-pack
      + rev-list", which is the whole point of this patch. If any of the
      conditions fail, we fall back to the good old but expensive "rev-list
      ..". In that case it's even more expensive because we have to pay for
      the new checks in index-pack. But that should only happen when the
      other side is either buggy or malicious.
      
      Cloning linux-2.6 over file://
      
              before         after
      real    3m25.693s      2m53.050s
      user    5m2.037s       4m42.396s
      sys     0m13.750s      0m16.574s
      
      A more realistic test with ssh:// over wireless
      
              before         after
      real    11m26.629s     10m4.213s
      user    5m43.196s      5m19.444s
      sys     0m35.812s      0m37.630s
      
      This shortcut is not applied to shallow clones, partly because shallow
      clones should have no more objects than a usual fetch and the cost of
      rev-list is acceptable, partly to avoid dealing with corner cases when
      grafting is involved.
      
      This shortcut does not apply to unpack-objects code path either
      because the number of objects must be small in order to trigger that
      code path.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      c6807a40
  15. 07 Feb, 2013 1 commit
    • Junio C Hamano's avatar
      fetch: use struct ref to represent refs to be fetched · f2db854d
      Junio C Hamano authored
      Even though "git fetch" has full infrastructure to parse refspecs to
      be fetched and match them against the list of refs to come up with
      the final list of refs to be fetched, the list of refs that are
      requested to be fetched were internally converted to a plain list of
      strings at the transport layer and then passed to the underlying
      fetch-pack driver.
      
      Stop this conversion and instead pass around an array of refs.
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      f2db854d
  16. 12 Sep, 2012 3 commits
  17. 02 Apr, 2012 1 commit
    • Ivan Todoroski's avatar
      fetch-pack: new --stdin option to read refs from stdin · 078b895f
      Ivan Todoroski authored
      If a remote repo has too many tags (or branches), cloning it over the
      smart HTTP transport can fail because remote-curl.c puts all the refs
      from the remote repo on the fetch-pack command line. This can make the
      command line longer than the global OS command line limit, causing
      fetch-pack to fail.
      
      This is especially a problem on Windows where the command line limit is
      orders of magnitude shorter than Linux. There are already real repos out
      there that msysGit cannot clone over smart HTTP due to this problem.
      
      Here is an easy way to trigger this problem:
      
      	git init too-many-refs
      	cd too-many-refs
      	echo bla > bla.txt
      	git add .
      	git commit -m test
      	sha=$(git rev-parse HEAD)
      	tag=$(perl -e 'print "bla" x 30')
      	for i in `seq 50000`; do
      		echo $sha refs/tags/$tag-$i >> .git/packed-refs
      	done
      
      Then share this repo over the smart HTTP protocol and try cloning it:
      
      	$ git clone http://localhost/.../too-many-refs/.git
      	Cloning into 'too-many-refs'...
      	fatal: cannot exec 'fetch-pack': Argument list too long
      
      50k tags is obviously an absurd number, but it is required to
      demonstrate the problem on Linux because it has a much more generous
      command line limit. On Windows the clone fails with as little as 500
      tags in the above loop, which is getting uncomfortably close to the
      number of tags you might see in real long lived repos.
      
      This is not just theoretical, msysGit is already failing to clone our
      company repo due to this. It's a large repo converted from CVS, nearly
      10 years of history.
      
      Four possible solutions were discussed on the Git mailing list (in no
      particular order):
      
      1) Call fetch-pack multiple times with smaller batches of refs.
      
      This was dismissed as inefficient and inelegant.
      
      2) Add option --refs-fd=$n to pass a an fd from where to read the refs.
      
      This was rejected because inheriting descriptors other than
      stdin/stdout/stderr through exec() is apparently problematic on Windows,
      plus it would require changes to the run-command API to open extra
      pipes.
      
      3) Add option --refs-from=$tmpfile to pass the refs using a temp file.
      
      This was not favored because of the temp file requirement.
      
      4) Add option --stdin to pass the refs on stdin, one per line.
      
      In the end this option was chosen as the most efficient and most
      desirable from scripting perspective.
      
      There was however a small complication when using stdin to pass refs to
      fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
      communication with the remote server.
      
      If we are going to sneak refs on stdin line by line, it would have to be
      done very carefully in the presence of --stateless-rpc, because when
      reading refs line by line we might read ahead too much data into our
      buffer and eat some of the remote protocol data which is also coming on
      stdin.
      
      One way to solve this would be to refactor get_remote_heads() in
      fetch-pack.c to accept a residual buffer from our stdin line parsing
      above, but this function is used in several places so other callers
      would be burdened by this residual buffer interface even when most of
      them don't need it.
      
      In the end we settled on the following solution:
      
      If --stdin is specified without --stateless-rpc, fetch-pack would read
      the refs from stdin one per line, in a script friendly format.
      
      However if --stdin is specified together with --stateless-rpc,
      fetch-pack would read the refs from stdin in packetized format
      (pkt-line) with a flush packet terminating the list of refs. This way we
      can read the exact number of bytes that we need from stdin, and then
      get_remote_heads() can continue reading from the same fd without losing
      a single byte of remote protocol data.
      
      This way the --stdin option only loses generality and scriptability when
      used together with --stateless-rpc, which is not easily scriptable
      anyway because it also uses pkt-line when talking to the remote server.
      Signed-off-by: 's avatarIvan Todoroski <grnch@gmx.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      078b895f
  18. 16 Mar, 2011 1 commit
    • Jonathan Nieder's avatar
      standardize brace placement in struct definitions · 9cba13ca
      Jonathan Nieder authored
      In a struct definitions, unlike functions, the prevailing style is for
      the opening brace to go on the same line as the struct name, like so:
      
       struct foo {
      	int bar;
      	char *baz;
       };
      
      Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many
      matches as 'struct [a-z_]*$'.
      
      Linus sayeth:
      
       Heretic people all over the world have claimed that this inconsistency
       is ...  well ...  inconsistent, but all right-thinking people know that
       (a) K&R are _right_ and (b) K&R are right.
      Signed-off-by: 's avatarJonathan Nieder <jrnieder@gmail.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      9cba13ca
  19. 05 Nov, 2009 1 commit
    • Shawn O. Pearce's avatar
      Smart fetch over HTTP: client side · 249b2004
      Shawn O. Pearce authored
      The git-remote-curl backend detects if the remote server supports
      the git-upload-pack service, and if so, runs git-fetch-pack locally
      in a pipe to generate the want/have commands.
      
      The advertisements from the server that were obtained during the
      discovery are passed into git-fetch-pack before the POST request
      starts, permitting server capability discovery and enablement.
      
      Common objects that are discovered are appended onto the request as
      have lines and are sent again on the next request.  This allows the
      remote side to reinitialize its in-memory list of common objects
      during the next request.
      
      Because all requests are relatively short, below git-remote-curl's
      1 MiB buffer limit, requests will use the standard Content-Length
      header and be valid HTTP/1.0 POST requests.  This makes the fetch
      client more tolerant of proxy servers which don't support HTTP/1.1
      or the chunked transfer encoding.
      Signed-off-by: 's avatarShawn O. Pearce <spearce@spearce.org>
      CC: Daniel Barkalow <barkalow@iabervon.org>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      249b2004
  20. 05 Mar, 2008 1 commit
  21. 05 Feb, 2008 1 commit
    • Daniel Barkalow's avatar
      Reduce the number of connects when fetching · ba227857
      Daniel Barkalow authored
      This shares the connection between getting the remote ref list and
      getting objects in the first batch. (A second connection is still used
      to follow tags).
      
      When we do not fetch objects (i.e. either ls-remote disconnects after
      getting list of refs, or we decide we are already up-to-date), we
      clean up the connection properly; otherwise the connection is left
      open in need of cleaning up to avoid getting an error message from
      the remote end when ssh is used.
      Signed-off-by: 's avatarDaniel Barkalow <barkalow@iabervon.org>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      ba227857
  22. 19 Sep, 2007 4 commits
    • Shawn O. Pearce's avatar
      Always obtain fetch-pack arguments from struct fetch_pack_args · fa740529
      Shawn O. Pearce authored
      Copying the arguments from a fetch_pack_args into static globals
      within the builtin-fetch-pack module is error-prone and may lead
      rise to cases where arguments supplied via the struct from the
      new fetch_pack() API may not be honored by the implementation.
      
      Here we reorganize all of the static globals into a single static
      struct fetch_pack_args instance and use memcpy() to move the data
      from the caller supplied structure into the globals before we
      execute our pack fetching implementation.  This strategy is more
      robust to additions and deletions of properties.
      
      As keep_pack is a single bit we have also introduced lock_pack to
      mean not only download and store the packfile via index-pack but
      also to lock it against repacking by creating a .keep file when
      the packfile itself is stored.  The caller must remove the .keep
      file when it is safe to do so.
      Signed-off-by: 's avatarShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      fa740529
    • Shawn O. Pearce's avatar
      Use 'unsigned:1' when we mean boolean options · bbaf4584
      Shawn O. Pearce authored
      These options are all strictly boolean (true/false).  Its easier to
      document this implicitly by making their storage type a single bit.
      There is no compelling memory space reduction reason for this change,
      it just makes the structure definition slightly more readable.
      Signed-off-by: 's avatarShawn O. Pearce <spearce@spearce.org>
      bbaf4584
    • Shawn O. Pearce's avatar
      Remove pack.keep after ref updates in git-fetch · 1788c39c
      Shawn O. Pearce authored
      If we are using a native packfile to perform a git-fetch invocation
      and the received packfile contained more than the configured limits
      of fetch.unpackLimit/transfer.unpackLimit then index-pack will output
      a single line saying "keep\t$sha1\n" to stdout.  This line needs to
      be captured and retained so we can delete the corresponding .keep
      file ("$GIT_DIR/objects/pack/pack-$sha1.keep") once all refs have
      been safely updated.
      
      This trick has long been in use with git-fetch.sh and its lower level
      helper git-fetch--tool as a way to allow index-pack to save the new
      packfile before the refs have been updated and yet avoid a race with
      any concurrently running git-repack process.  It was unfortunately
      lost when git-fetch.sh was converted to pure C and fetch--tool was
      no longer being invoked.
      Signed-off-by: 's avatarShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      1788c39c
    • Daniel Barkalow's avatar