• Karsten Blees's avatar
    dir.c: unify is_excluded and is_path_excluded APIs · 95c6f271
    Karsten Blees authored
    The is_excluded and is_path_excluded APIs are very similar, except for a
    few noteworthy differences:
    
    is_excluded doesn't handle ignored directories, results for paths within
    ignored directories are incorrect. This is probably based on the premise
    that recursive directory scans should stop at ignored directories, which
    is no longer true (in certain cases, read_directory_recursive currently
    calls is_excluded *and* is_path_excluded to get correct ignored state).
    
    is_excluded caches parsed .gitignore files of the last directory in struct
    dir_struct. If the directory changes, it finds a common parent directory
    and is very careful to drop only as much state as necessary. On the other
    hand, is_excluded will also read and parse .gitignore files in already
    ignored directories, which are completely irrelevant.
    
    is_path_excluded correctly handles ignored directories by checking if any
    component in the path is excluded. As it uses is_excluded internally, this
    unfortunately forces is_excluded to drop and re-read all .gitignore files,
    as there is no common parent directory for the root dir.
    
    is_path_excluded tracks state in a separate struct path_exclude_check,
    which is essentially a wrapper of dir_struct with two more fields. However,
    as is_path_excluded also modifies dir_struct, it is not possible to e.g.
    use multiple path_exclude_check structures with the same dir_struct in
    parallel. The additional structure just unnecessarily complicates the API.
    
    Teach is_excluded / prep_exclude about ignored directories: whenever
    entering a new directory, first check if the entire directory is excluded.
    Remember the excluded state in dir_struct. Don't traverse into already
    ignored directories (i.e. don't read irrelevant .gitignore files).
    
    Directories could also be excluded by exclude patterns specified on the
    command line or .git/info/exclude, so we cannot simply skip prep_exclude
    entirely if there's no .gitignore file name (dir_struct.exclude_per_dir).
    Move this check to just before actually reading the file.
    
    is_path_excluded is now equivalent to is_excluded, so we can simply
    redirect to it (the public API is cleaned up in the next patch).
    
    The performance impact of the additional ignored check per directory is
    hardly noticeable when reading directories recursively (e.g. 'git status').
    However, performance of git commands using the is_path_excluded API (e.g.
    'git ls-files --cached --ignored --exclude-standard') is greatly improved
    as this no longer re-reads .gitignore files on each call.
    
    Here's some performance data from the linux and WebKit repos (best of 10
    runs on a Debian Linux on SSD, core.preloadIndex=true):
    
           | ls-files -ci   |    status      | status --ignored
           | linux | WebKit | linux | WebKit | linux | WebKit
    -------+-------+--------+-------+--------+-------+---------
    before | 0.506 |  6.539 | 0.212 |  1.555 | 0.323 |  2.541
    after  | 0.080 |  1.191 | 0.218 |  1.583 | 0.321 |  2.579
    gain   | 6.325 |  5.490 | 0.972 |  0.982 | 1.006 |  0.985
    Signed-off-by: default avatarKarsten Blees <[email protected]>
    Signed-off-by: default avatarJunio C Hamano <[email protected]>
    95c6f271
dir.c 42 KB