1. 14 Nov, 2018 4 commits
    • brian m. carlson's avatar
      hash: add an SHA-256 implementation using OpenSSL · 4b4e2918
      brian m. carlson authored
      We already have OpenSSL routines available for SHA-1, so add routines
      for SHA-256 as well.
      
      On a Core i7-6600U, this SHA-256 implementation compares favorably to
      the SHA1DC SHA-1 implementation:
      
      SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
      SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      4b4e2918
    • brian m. carlson's avatar
      sha256: add an SHA-256 implementation using libgcrypt · 27dc04c5
      brian m. carlson authored
      Generally, one gets better performance out of cryptographic routines
      written in assembly than C, and this is also true for SHA-256.  In
      addition, most Linux distributions cannot distribute Git linked against
      OpenSSL for licensing reasons.
      
      Most systems with GnuPG will also have libgcrypt, since it is a
      dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
      implementation for messages of a few KiB and larger.
      
      For comparison, on a Core i7-6600U, this implementation processes 16 KiB
      chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
      MiB/s.
      
      In addition, libgcrypt is licensed under the LGPL 2.1, which is
      compatible with the GPL.  Add an implementation of SHA-256 that uses
      libgcrypt.
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      27dc04c5
    • brian m. carlson's avatar
      Add a base implementation of SHA-256 support · 13eeedb5
      brian m. carlson authored
      SHA-1 is weak and we need to transition to a new hash function.  For
      some time, we have referred to this new function as NewHash.  Recently,
      we decided to pick SHA-256 as NewHash.  The reasons behind the choice of
      SHA-256 are outlined in the thread starting at [1] and in the commit
      history for the hash function transition document.
      
      Add a basic implementation of SHA-256 based off libtomcrypt, which is in
      the public domain.  Optimize it and restructure it to meet our coding
      standards.  Pull in the update and final functions from the SHA-1 block
      implementation, as we know these function correctly with all compilers.
      This implementation is slower than SHA-1, but more performant
      implementations will be introduced in future commits.
      
      Wire up SHA-256 in the list of hash algorithms, and add a test that the
      algorithm works correctly.
      
      Note that with this patch, it is still not possible to switch to using
      SHA-256 in Git.  Additional patches are needed to prepare the code to
      handle a larger hash algorithm and further test fixes are needed.
      
      [1] https://public-inbox.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      13eeedb5
    • brian m. carlson's avatar
      sha1-file: add a constant for hash block size · a2ce0a75
      brian m. carlson authored
      There is one place we need the hash algorithm block size: the HMAC code
      for push certs.  Expose this constant in struct git_hash_algo and expose
      values for SHA-1 and for the largest value of any hash.
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      a2ce0a75
  2. 22 Oct, 2018 1 commit
  3. 09 Feb, 2018 1 commit
  4. 02 Feb, 2018 2 commits
    • brian m. carlson's avatar
      hash: create union for hash context allocation · ac73cedf
      brian m. carlson authored
      In various parts of our code, we want to allocate a structure
      representing the internal state of a hash algorithm.  The original
      implementation of the hash algorithm abstraction assumed we would do
      that using heap allocations, and added a context size element to struct
      git_hash_algo.  However, most of the existing code uses stack
      allocations and conversion would needlessly complicate various parts of
      the code.  Add a union for the purpose of allocating hash contexts on
      the stack and a typedef for ease of use.  Use this union for defining
      the init, update, and final functions to avoid casts.  Remove the ctxsz
      element for struct git_hash_algo, which is no longer very useful.
      
      This does mean that stack allocations will grow slightly as additional
      hash functions are added, but this should not be a significant problem,
      since we don't allocate many hash contexts.  The improved usability and
      benefits from avoiding dynamic allocation outweigh this small downside.
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ac73cedf
    • brian m. carlson's avatar
      hash: move SHA-1 macros to hash.h · 164e7163
      brian m. carlson authored
      Most of the other code dealing with SHA-1 and other hashes is located in
      hash.h, which is in turn loaded by cache.h.  Move the SHA-1 macros to
      hash.h as well, so we can use them in additional hash-related items in
      the future.
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      164e7163
  5. 13 Nov, 2017 1 commit
    • brian m. carlson's avatar
      Add structure representing hash algorithm · f50e766b
      brian m. carlson authored
      Since in the future we want to support an additional hash algorithm, add
      a structure that represents a hash algorithm and all the data that must
      go along with it.  Add a constant to allow easy enumeration of hash
      algorithms.  Implement function typedefs to create an abstract API that
      can be used by any hash algorithm, and wrappers for the existing SHA1
      functions that conform to this API.
      
      Expose a value for hex size as well as binary size.  While one will
      always be twice the other, the two values are both used extremely
      commonly throughout the codebase and providing both leads to improved
      readability.
      
      Don't include an entry in the hash algorithm structure for the null
      object ID.  As this value is all zeros, any suitably sized all-zero
      object ID can be used, and there's no need to store a given one on a
      per-hash basis.
      
      The current hash function transition plan envisions a time when we will
      accept input from the user that might be in SHA-1 or in the NewHash
      format.  Since we cannot know which the user has provided, add a
      constant representing the unknown algorithm to allow us to indicate that
      we must look the correct value up.  Provide dummy API functions that die
      in this case.
      
      Finally, include git-compat-util.h in hash.h so that the required types
      are available.  This aids people using automated tools their editors.
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f50e766b
  6. 16 Aug, 2017 1 commit
    • Takashi Iwai's avatar
      sha1dc: build git plumbing code more explicitly · 36f048c5
      Takashi Iwai authored
      The plumbing code between sha1dc and git is defined in
      sha1dc_git.[ch], but these aren't compiled / included directly but
      only via the indirect inclusion from sha1dc code.  This is slightly
      confusing when you try to trace the build flow.
      
      This patch brings the following changes for simplification:
      
        - Make sha1dc_git.c stand-alone and build from Makefile
      
        - sha1dc_git.h is the common header to include further sha1.h
          depending on the build condition
      
        - Move comments for plumbing codes from the header to definitions
      
      This is also meant as a preliminary work for further plumbing with
      external sha1dc shlib.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      36f048c5
  7. 03 Jul, 2017 1 commit
  8. 17 Mar, 2017 1 commit
    • Jeff King's avatar
      Makefile: add DC_SHA1 knob · 8325e43b
      Jeff King authored
      This knob lets you use the sha1dc implementation from:
      
            https://github.com/cr-marcstevens/sha1collisiondetection
      
      which can detect certain types of collision attacks (even
      when we only see half of the colliding pair). So it
      mitigates any attack which consists of getting the "good"
      half of a collision into a trusted repository, and then
      later replacing it with the "bad" half. The "good" half is
      rejected by the victim's version of Git (and even if they
      run an old version of Git, any sha1dc-enabled git will
      complain loudly if it ever has to interact with the object).
      
      The big downside is that it's slower than either the openssl
      or block-sha1 implementations.
      
      Here are some timings based off of linux.git:
      
        - compute sha1 over whole packfile
            sha1dc: 3.580s
          blk-sha1: 2.046s (-43%)
           openssl: 1.335s (-62%)
      
        - rev-list --all --objects
            sha1dc: 33.512s
          blk-sha1: 33.514s (+0.0%)
           openssl: 33.650s (+0.4%)
      
        - git log --no-merges -10000 -p
            sha1dc: 8.124s
          blk-sha1: 7.986s (-1.6%)
           openssl: 8.203s (+0.9%)
      
        - index-pack --verify
            sha1dc: 4m19s
          blk-sha1: 2m57s (-32%)
           openssl: 2m19s (-42%)
      
      So overall the sha1 computation with collision detection is
      about 1.75x slower than block-sha1, and 2.7x slower than
      sha1. But of course most operations do more than just sha1.
      Normal object access isn't really slowed at all (both the
      +/- changes there are well within the run-to-run noise); any
      changes are drowned out by the other work Git is doing.
      
      The most-affected operation is `index-pack --verify`, which
      is essentially just computing the sha1 on every object. This
      is similar to the `index-pack` invocation that the receiver
      of a push or fetch would perform. So clearly there's some
      extra CPU load here.
      
      There will also be some latency for the user, though keep in
      mind that such an operation will generally be network bound
      (this is about a 1.2GB packfile). Some of that extra CPU is
      "free" in the sense that we use it while the pack is
      streaming in anyway. But most of it comes during the
      delta-resolution phase, after the whole pack has been
      received. So we can imagine that for this (quite large)
      push, the user might have to wait an extra 100 seconds over
      openssl (which is what we use now). If we assume they can
      push to us at 20Mbit/s, that's 480s for a 1.2GB pack, which
      is only 20% slower.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      8325e43b
  9. 15 Mar, 2017 1 commit
    • brian m. carlson's avatar
      hash.h: move SHA-1 implementation selection into a header file · f18f816c
      brian m. carlson authored
      Many developers use functionality in their editors that allows for quick
      syntax checks, including warning about questionable constructs.  This
      functionality allows rapid development with fewer errors.  However, such
      functionality generally does not allow the specification of
      project-specific defines or command-line options.
      
      Since the SHA1_HEADER include is not defined in such a case,
      developers see spurious errors when using these tools.  Furthermore,
      there are known implementations of "cc" whose '#include' is unhappy
      with this construct.
      
      Instead of using SHA1_HEADER, create a hash.h header and use #if
      and #elif to select the desired header.  Have the Makefile pass an
      appropriate option to help the header select the right implementation to
      use.
      
      [jc: make BLK_SHA1 the fallback default as discussed on list,
      e.g. <20170314201424.vccij5z2ortq4a4o@sigill.intra.peff.net>; also
      remove SHA1_HEADER and SHA1_HEADER_SQ that are no longer used].
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      Reviewed-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f18f816c
  10. 18 Nov, 2013 1 commit
  11. 17 Mar, 2013 1 commit
  12. 19 Feb, 2011 1 commit
  13. 09 Mar, 2008 1 commit
  14. 27 Oct, 2007 1 commit
    • Linus Torvalds's avatar
      Do linear-time/space rename logic for exact renames · 9027f53c
      Linus Torvalds authored
      This implements a smarter rename detector for exact renames, which
      rather than doing a pairwise comparison (time O(m*n)) will just hash the
      files into a hash-table (size O(n+m)), and only do pairwise comparisons
      to renames that have the same hash (time O(n+m) except for unrealistic
      hash collissions, which we just cull aggressively).
      
      Admittedly the exact rename case is not nearly as interesting as the
      generic case, but it's an important case none-the-less. A similar general
      approach should work for the generic case too, but even then you do need
      to handle the exact renames/copies separately (to avoid the inevitable
      added cost factor that comes from the _size_ of the file), so this is
      worth doing.
      
      In the expectation that we will indeed do the same hashing trick for the
      general rename case, this code uses a generic hash-table implementation
      that can be used for other things too.  In fact, we might be able to
      consolidate some of our existing hash tables with the new generic code
      in hash.[ch].
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      9027f53c