Skip to content
  • Duy Nguyen's avatar
    untracked cache: record/validate dir mtime and reuse cached output · 91a2288b
    Duy Nguyen authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    The main readdir loop in read_directory_recursive() is replaced with a
    new one that checks if cached results of a directory is still valid.
    
    If a file is added or removed from the index, the containing directory
    is invalidated (but not its subdirs). If directory's mtime is changed,
    the same happens. If a .gitignore is updated, the containing directory
    and all subdirs are invalidated recursively. If dir_struct#flags or
    other conditions change, the cache is ignored.
    
    If a directory is invalidated, we opendir/readdir/closedir and run the
    exclude machinery on that directory listing as usual. If untracked
    cache is also enabled, we'll update the cache along the way. If a
    directory is validated, we simply pull the untracked listing out from
    the cache. The cache also records the list of direct subdirs that we
    have to recurse in. Fully excluded directories are seen as "untracked
    files".
    
    In the best case when no dirs are invalidated, read_directory()
    becomes a series of
    
      stat(dir), open(.gitignore), fstat(), read(), close() and optionally
      hash_sha1_file()
    
    For comparison, standard read_directory() is a sequence of
    
      opendir(), readdir(), open(.gitignore), fstat(), read(), close(), the
      expensive last_exclude_matching() and closedir().
    
    We already try not to open(.gitignore) if we know it does not exist,
    so open/fstat/read/close sequence does not apply to every
    directory. The sequence could be reduced further, as noted in
    prep_exclude() in another patch. So in theory, the entire best-case
    read_directory sequence could be reduced to a series of stat() and
    nothing else.
    
    This is not a silver bullet approach. When you compile a C file, for
    example, the old .o file is removed and a new one with the same name
    created, effectively invalidating the containing directory's cache
    (but not its subdirectories). If your build process touches every
    directory, this cache adds extra overhead for nothing, so it's a good
    idea to separate generated files from tracked files.. Editors may use
    the same strategy for saving files. And of course you're out of luck
    running your repo on an unsupported filesystem and/or operating system.
    
    Helped-by: default avatarEric Sunshine <sunshine@sunshineco.com>
    Signed-off-by: default avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    91a2288b