Skip to content
  • Johannes Schindelin's avatar
    is_ntfs_dotgit: match other .git files · e7cb0b44
    Johannes Schindelin authored
    When we started to catch NTFS short names that clash with .git, we only
    looked for GIT~1. This is sufficient because we only ever clone into an
    empty directory, so .git is guaranteed to be the first subdirectory or
    file in that directory.
    
    However, even with a fresh clone, .gitmodules is *not* necessarily the
    first file to be written that would want the NTFS short name GITMOD~1: a
    malicious repository can add .gitmodul0000 and friends, which sorts
    before `.gitmodules` and is therefore checked out *first*. For that
    reason, we have to test not only for ~1 short names, but for others,
    too.
    
    It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
    Windows 2000 (i.e., in all Windows versions still supported by Git),
    NTFS short names are only generated in the <prefix>~<number> form up to
    number 4. After that, a *different* prefix is used, calculated from the
    long file name using an undocumented, but stable algorithm.
    
    For example, the short name of .gitmodules would be GITMOD~1, but if it
    is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
    GI7EBA~1 will be used. From there, collisions are handled by
    incrementing the number, shortening the prefix as needed (until ~9999999
    is reached, in which case NTFS will not allow the file to be created).
    
    We'd also want to handle .gitignore and .gitattributes, which suffer
    from a similar problem, using the fall-back short names GI250A~1 and
    GI7D29~1, respectively.
    
    To accommodate for that, we could reimplement the hashing algorithm, but
    it is just safer and simpler to provide the known prefixes. This
    algorithm has been reverse-engineered and described at
    https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
    still available via https://web.archive.org/
    
    .
    
    These can be recomputed by running the following Perl script:
    
    -- snip --
    use warnings;
    use strict;
    
    sub compute_short_name_hash ($) {
            my $checksum = 0;
            foreach (split('', $_[0])) {
                    $checksum = ($checksum * 0x25 + ord($_)) & 0xffff;
            }
    
            $checksum = ($checksum * 314159269) & 0xffffffff;
            $checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000);
            $checksum -= (($checksum * 1152921497) >> 60) * 1000000007;
    
            return scalar reverse sprintf("%x", $checksum & 0xffff);
    }
    
    print compute_short_name_hash($ARGV[0]);
    -- snap --
    
    E.g., running that with the argument ".gitignore" will
    result in "250a" (which then becomes "gi250a" in the code).
    
    Signed-off-by: default avatarJohannes Schindelin <johannes.schindelin@gmx.de>
    Signed-off-by: default avatarJeff King <peff@peff.net>
    e7cb0b44