Skip to content
  • Junio C Hamano's avatar
    git-add: make the entry stat-clean after re-adding the same contents · fb63d7f8
    Junio C Hamano authored
    Earlier in commit 0781b8a9
    
    
    (add_file_to_index: skip rehashing if the cached stat already
    matches), add_file_to_index() were taught not to re-add the path
    if it already matches the index.
    
    The change meant well, but was not executed quite right.  It
    used ie_modified() to see if the file on the work tree is really
    different from the index, and skipped adding the contents if the
    function says "not modified".
    
    This was wrong.  There are three possible comparison results
    between the index and the file in the work tree:
    
     - with lstat(2) we _know_ they are different.  E.g. if the
       length or the owner in the cached stat information is
       different from the length we just obtained from lstat(2), we
       can tell the file is modified without looking at the actual
       contents.
    
     - with lstat(2) we _know_ they are the same.  The same length,
       the same owner, the same everything (but this has a twist, as
       described below).
    
     - we cannot tell from lstat(2) information alone and need to go
       to the filesystem to actually compare.
    
    The last case arises from what we call 'racy git' situation,
    that can be caused with this sequence:
    
        $ echo hello >file
        $ git add file
        $ echo aeiou >file ;# the same length
    
    If the second "echo" is done within the same filesystem
    timestamp granularity as the first "echo", then the timestamp
    recorded by "git add" and the timestamp we get from lstat(2)
    will be the same, and we can mistakenly say the file is not
    modified.  The path is called 'racily clean'.  We need to
    reliably detect racily clean paths are in fact modified.
    
    To solve this problem, when we write out the index, we mark the
    index entry that has the same timestamp as the index file itself
    (that is the time from the point of view of the filesystem) to
    tell any later code that does the lstat(2) comparison not to
    trust the cached stat info, and ie_modified() then actually goes
    to the filesystem to compare the contents for such a path.
    
    That's all good, but it should not be used for this "git add"
    optimization, as the goal of "git add" is to actually update the
    path in the index and make it stat-clean.  With the false
    optimization, we did _not_ cause any data loss (after all, what
    we failed to do was only to update the cached stat information),
    but it made the following sequence leave the file stat dirty:
    
        $ echo hello >file
        $ git add file
        $ echo hello >file ;# the same contents
        $ git add file
    
    The solution is not to use ie_modified() which goes to the
    filesystem to see if it is really clean, but instead use
    ie_match_stat() with "assume racily clean paths are dirty"
    option, to force re-adding of such a path.
    
    There was another problem with "git add -u".  The codepath
    shares the same issue when adding the paths that are found to be
    modified, but in addition, it asked "git diff-files" machinery
    run_diff_files() function (which is "git diff-files") to list
    the paths that are modified.  But "git diff-files" machinery
    uses the same ie_modified() call so that it does not report
    racily clean _and_ actually clean paths as modified, which is
    not what we want.
    
    The patch allows the callers of run_diff_files() to pass the
    same "assume racily clean paths are dirty" option, and makes
    "git-add -u" codepath to use that option, to discover and re-add
    racily clean _and_ actually clean paths.
    
    We could further optimize on top of this patch to differentiate
    the case where the path really needs re-adding (i.e. the content
    of the racily clean entry was indeed different) and the case
    where only the cached stat information needs to be refreshed
    (i.e. the racily clean entry was actually clean), but I do not
    think it is worth it.
    
    This patch applies to maint and all the way up.
    
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    fb63d7f8