1. 22 Feb, 2016 1 commit
  2. 21 Oct, 2015 1 commit
  3. 07 Jul, 2014 1 commit
    • Karsten Blees's avatar
      hashmap: add simplified hashmap_get_from_hash() API · ab73a9d1
      Karsten Blees authored
      Hashmap entries are typically looked up by just a key. The hashmap_get()
      API expects an initialized entry structure instead, to support compound
      keys. This flexibility is currently only needed by find_dir_entry() in
      name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other
      (currently five) call sites of hashmap_get() have to set up a near emtpy
      entry structure, resulting in duplicate code like this:
        struct hashmap_entry keyentry;
        hashmap_entry_init(&keyentry, hash(key));
        return hashmap_get(map, &keyentry, key);
      Add a hashmap_get_from_hash() API that allows hashmap lookups by just
      specifying the key and its hash code, i.e.:
        return hashmap_get_from_hash(map, hash(key), key);
      Signed-off-by: default avatarKarsten Blees <blees@dcon.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  4. 20 Jun, 2014 1 commit
  5. 24 Feb, 2014 1 commit
  6. 18 Nov, 2013 4 commits
  7. 17 Sep, 2013 2 commits
    • Eric Sunshine's avatar
      name-hash: stop storing trailing '/' on paths in index_state.dir_hash · d28eec26
      Eric Sunshine authored
      When 5102c617 (Add case insensitivity support for directories when using
      git status, 2010-10-03) added directories to the name-hash there was
      only a single hash table in which both real cache entries and leading
      directory prefixes were registered. To distinguish between the two types
      of entries, directories were stored with a trailing '/'.
      2092678c (name-hash.c: fix endless loop with core.ignorecase=true,
      2013-02-28), however, moved directories to a separate hash table
      (index_state.dir_hash) but retained the (now) redundant trailing '/',
      thus callers continue to bear the burden of ensuring the slash's
      presence before searching the index for a directory. Eliminate this
      redundancy by storing paths in the dir-hash without the trailing '/'.
      An important benefit of this change is that it eliminates undocumented
      and dangerous behavior of dir.c:directory_exists_in_index_icase() in
      which it assumes not only that it can validly access one character
      beyond the end of its incoming directory argument, but also that that
      character will unconditionally be a '/'. This perilous behavior was
      "tolerated" because the string passed in by its lone caller always had a
      '/' in that position, however, things broke [1] when 2eac2a4c (ls-files
      -k: a directory only can be killed if the index has a non-directory,
      2013-08-15) added a new caller which failed to respect the undocumented
      [1]: http://thread.gmane.org/gmane.comp.version-control.git/232727Signed-off-by: Eric Sunshine's avatarEric Sunshine <sunshine@sunshineco.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    • Eric Sunshine's avatar
      name-hash: refactor polymorphic index_name_exists() · db5360f3
      Eric Sunshine authored
      Depending upon the absence or presence of a trailing '/' on the incoming
      pathname, index_name_exists() checks either if a file is present in the
      index or if a directory is represented within the index. Each caller
      explicitly chooses the mode of operation by adding or removing a
      trailing '/' before invoking index_name_exists().
      Since these two modes of operations are disjoint and have no code in
      common (one searches index_state.name_hash; the other dir_hash), they
      can be represented more naturally as distinct functions: one to search
      for a file, and one for a directory.
      Splitting index searching into two functions relieves callers of the
      artificial burden of having to add or remove a slash to select the mode
      of operation; instead they just call the desired function. A subsequent
      patch will take advantage of this benefit in order to eliminate the
      requirement that the incoming pathname for a directory search must have
      a trailing slash.
      (In order to avoid disturbing in-flight topics, index_name_exists() is
      retained as a thin wrapper dispatching either to index_dir_exists() or
      Signed-off-by: Eric Sunshine's avatarEric Sunshine <sunshine@sunshineco.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  8. 17 Mar, 2013 1 commit
  9. 28 Feb, 2013 1 commit
    • Karsten Blees's avatar
      name-hash.c: fix endless loop with core.ignorecase=true · 2092678c
      Karsten Blees authored
      With core.ignorecase=true, name-hash.c builds a case insensitive index of
      all tracked directories. Currently, the existing cache entry structures are
      added multiple times to the same hashtable (with different name lengths and
      hash codes). However, there's only one dir_next pointer, which gets
      completely messed up in case of hash collisions. In the worst case, this
      causes an endless loop if ce == ce->dir_next (see t7062).
      Use a separate hashtable and separate structures for the directory index
      so that each directory entry has its own next pointer. Use reference
      counting to track which directory entry contains files.
      There are only slight changes to the name-hash.c API:
      - new free_name_hash() used by read_cache.c::discard_index()
      - remove_name_hash() takes an additional index_state parameter
      - index_name_exists() for a directory (trailing '/') may return a cache
        entry that has been removed (CE_UNHASHED). This is not a problem as the
        return value is only used to check if the directory exists (dir.c) or to
        normalize casing of directory names (read-cache.c).
      Getting rid of cache_entry.dir_next reduces memory consumption, especially
      with core.ignorecase=false (which doesn't use that member at all).
      With core.ignorecase=true, building the directory index is slightly faster
      as we add / check the parent directory first (instead of going through all
      directory levels for each file in the index). E.g. with WebKit (~200k
      files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms
      to 130ms.
      Signed-off-by: default avatarKarsten Blees <blees@dcon.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  10. 19 Feb, 2013 1 commit
    • Junio C Hamano's avatar
      name-hash: allow hashing an empty string · c19387e7
      Junio C Hamano authored
      Usually we do not pass an empty string to the function hash_name()
      because we almost always ask for hash values for a path that is a
      candidate to be added to the index. However, check-ignore (and most
      likely check-attr, but I didn't check) apparently has a callchain
      to ask the hash value for an empty path when it was given a "." from
      the top-level directory to ask "Is the path . excluded by default?"
      Make sure that hash_name() does not overrun the end of the given
      pathname even when it is empty.
      Remove a sweep-the-issue-under-the-rug conditional in check-ignore
      that avoided to pass an empty string to the callchain while at it.
      It is a valid question to ask for check-ignore if the top-level is
      set to be ignored by default, even though the answer is most likely
      no, if only because there is currently no way to specify such an
      entry in the .gitignore file. But it is an unusual thing to ask and
      it is not worth optimizing for it by special casing at the top level
      of the call chain.
      Signed-off-by: default avatarAdam Spiers <git@adamspiers.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  11. 01 Nov, 2011 1 commit
  12. 08 Oct, 2011 1 commit
    • Jeff King's avatar
      fix phantom untracked files when core.ignorecase is set · 2548183b
      Jeff King authored
      When core.ignorecase is turned on and there are stale index
      entries, "git commit" can sometimes report directories as
      untracked, even though they contain tracked files.
      You can see an example of this with:
          # make a case-insensitive repo
          git init repo && cd repo &&
          git config core.ignorecase true &&
          # with some tracked files in a subdir
          mkdir subdir &&
          > subdir/one &&
          > subdir/two &&
          git add . &&
          git commit -m base &&
          # now make the index entries stale
          touch subdir/* &&
          # and then ask commit to update those entries and show
          # us the status template
          git commit -a
      which will report "subdir/"  as untracked, even though it
      clearly contains two tracked files. What is happening in the
      commit program is this:
        1. We load the index, and for each entry, insert it into the index's
           name_hash. In addition, if ignorecase is turned on, we make an
           entry in the name_hash for the directory (e.g., "contrib/"), which
           uses the following code from 5102c617's hash_index_entry_directories:
              hash = hash_name(ce->name, ptr - ce->name);
              if (!lookup_hash(hash, &istate->name_hash)) {
                      pos = insert_hash(hash, &istate->name_hash);
      		if (pos) {
      			ce->next = *pos;
      			*pos = ce;
           Note that we only add the directory entry if there is not already an
        2. We run add_files_to_cache, which gets updated information for each
           cache entry. It helpfully inserts this information into the cache,
           which calls replace_index_entry. This in turn calls
           remove_name_hash() on the old entry, and add_name_hash() on the new
           one. But remove_name_hash doesn't actually remove from the hash, it
           only marks it as "no longer interesting" (from cache.h):
             * We don't actually *remove* it, we can just mark it invalid so that
             * we won't find it in lookups.
             * Not only would we have to search the lists (simple enough), but
             * we'd also have to rehash other hash buckets in case this makes the
             * hash bucket empty (common). So it's much better to just mark
             * it.
            static inline void remove_name_hash(struct cache_entry *ce)
                    ce->ce_flags |= CE_UNHASHED;
           This is OK in the specific-file case, since the entries in the hash
           form a linked list, and we can just skip the "not here anymore"
           entries during lookup.
           But for the directory hash entry, we will _not_ write a new entry,
           because there is already one there: the old one that is actually no
           longer interesting!
        3. While traversing the directories, we end up in the
           directory_exists_in_index_icase function to see if a directory is
           interesting. This in turn checks index_name_exists, which will
           look up the directory in the index's name_hash. We see the old,
           deleted record, and assume there is nothing interesting. The
           directory gets marked as untracked, even though there are index
           entries in it.
      The problem is in the code I showed above:
              hash = hash_name(ce->name, ptr - ce->name);
              if (!lookup_hash(hash, &istate->name_hash)) {
                      pos = insert_hash(hash, &istate->name_hash);
      		if (pos) {
      			ce->next = *pos;
      			*pos = ce;
      Having a single cache entry that represents the directory is
      not enough; that entry may go away if the index is changed.
      It may be tempting to say that the problem is in our removal
      method; if we removed the entry entirely instead of simply
      marking it as "not here anymore", then we would know we need
      to insert a new entry. But that only covers this particular
      case of remove-replace. In the more general case, consider
      something like this:
        1. We add "foo/bar" and "foo/baz" to the index. Each gets
           their own entry in name_hash, plus we make a "foo/"
           entry that points to "foo/bar".
        2. We remove the "foo/bar" entry from the index, and from
           the name_hash.
        3. We ask if "foo/" exists, and see no entry, even though
           "foo/baz" exists.
      So we need that directory entry to have the list of _all_
      cache entries that indicate that the directory is tracked.
      So that implies making a linked list as we do for other
      entries, like:
        hash = hash_name(ce->name, ptr - ce->name);
        pos = insert_hash(hash, &istate->name_hash);
        if (pos) {
      	  ce->next = *pos;
      	  *pos = ce;
      But that's not right either. In fact, it shows a second bug
      in the current code, which is that the "ce->next" pointer is
      supposed to be linking entries for a specific filename
      entry, but here we are overwriting it for the directory
      entry. So the same cache entry ends up in two linked lists,
      but they share the same "next" pointer.
      As it turns out, this second bug can't be triggered in the
      current code. The "if (pos)" conditional is totally dead
      code; pos will only be non-NULL if there was an existing
      hash entry, and we already checked that there wasn't one
      through our call to lookup_hash.
      But fixing the first bug means taking out that call to
      lookup_hash, which is going to activate the buggy dead code,
      and we'll end up splicing the two linked lists together.
      So we need to have a separate next pointer for the list in
      the directory bucket, and we need to traverse that list in
      index_name_exists when we are looking up a directory.
      This bloats "struct cache_entry" by a few bytes. Which is
      annoying, because it's only necessary when core.ignorecase
      is enabled. There's not an easy way around it, short of
      separating out the "next" pointers from cache_entry entirely
      (i.e., having a separate "cache_entry_list" struct that gets
      stored in the name_hash). In practice, it probably doesn't
      matter; we have thousands of cache entries, compared to the
      millions of objects (where adding 4 bytes to the struct
      actually does impact performance).
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  13. 06 Oct, 2010 1 commit
    • Joshua Jensen's avatar
      Add case insensitivity support for directories when using git status · 5102c617
      Joshua Jensen authored
      When using a case preserving but case insensitive file system, directory
      case can differ but still refer to the same physical directory.  git
      status reports the directory with the alternate case as an Untracked
      file.  (That is, when mydir/filea.txt is added to the repository and
      then the directory on disk is renamed from mydir/ to MyDir/, git status
      shows MyDir/ as being untracked.)
      Support has been added in name-hash.c for hashing directories with a
      terminating slash into the name hash. When index_name_exists() is called
      with a directory (a name with a terminating slash), the name is not
      found via the normal cache_name_compare() call, but it is found in the
      slow_same_name() function.
      Additionally, in dir.c, directory_exists_in_index_icase() allows newly
      added directories deeper in the directory chain to be identified.
      Ultimately, it would be better if the file list was read in case
      insensitive alphabetical order from disk, but this change seems to
      suffice for now.
      The end result is the directory is looked up in a case insensitive
      manner and does not show in the Untracked files list.
      Signed-off-by: default avatarJoshua Jensen <jjensen@workspacewhiz.com>
      Signed-off-by: default avatarJohannes Sixt <j6t@kdbg.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  14. 09 Apr, 2008 3 commits