Skip to content
  • Gábor Szeder's avatar
    split-index: don't compare cached data of entries already marked for split index · e3d83798
    Gábor Szeder authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    When unpack_trees() constructs a new index, it copies cache entries
    from the original index [1].  prepare_to_write_split_index() has to
    deal with this, and it has a dedicated code path for copied entries
    that are present in the shared index, where it compares the cached
    data in the corresponding copied and original entries.  If the cached
    data matches, then they are considered the same; if it differs, then
    the copied entry will be marked for inclusion as a replacement entry
    in the just about to be written split index by setting the
    CE_UPDATE_IN_BASE flag.
    
    However, a cache entry already has its CE_UPDATE_IN_BASE flag set upon
    reading the split index, if the entry already has a replacement entry
    there, or upon refreshing the cached stat data, if the corresponding
    file was modified.  The state of this flag is then preserved when
    unpack_trees() copies a cache entry from the shared index.
    
    So modify prepare_to_write_split_index() to check the copied cache
    entries' CE_UPDATE_IN_BASE flag first, and skip the thorough
    comparison of cached data if the flag is already set.  Those couple of
    lines comparing the cached data would then have too many levels of
    indentation, so extract them into a helper function.
    
    Note that comparing the cached data in copied and original entries in
    the shared index might actually be entirely unnecessary.  In theory
    all code paths refreshing the cached stat data of an entry in the
    shared index should set the CE_UPDATE_IN_BASE flag in that entry, and
    unpack_trees() should preserve this flag when copying cache entries.
    This means that the cached data is only ever changed if the
    CE_UPDATE_IN_BASE flag is set as well.  Our test suite seems to
    confirm this: instrumenting the conditions in question and running the
    test suite repeatedly with 'GIT_TEST_SPLIT_INDEX=yes' showed that the
    cached data in a copied entry differs from the data in the shared
    entry only if its CE_UPDATE_IN_BASE flag is indeed set.
    
    In practice, however, our test suite doesn't have 100% coverage,
    GIT_TEST_SPLIT_INDEX is inherently random, and I certainly can't claim
    to possess complete understanding of what goes on in unpack_trees()...
    Therefore I kept the comparison of the cached data when
    CE_UPDATE_IN_BASE is not set, just in case that an unnoticed or future
    code path were to accidentally miss setting this flag upon refreshing
    the cached stat data or unpack_trees() were to drop this flag while
    copying a cache entry.
    
    [1] Note that when unpack_trees() constructs the new index and decides
        that a cache entry should now refer to different content than what
        was recorded in the original index (e.g. 'git read-tree -m
        HEAD^'), then that can't really be considered a copy of the
        original, but rather the creation of a new entry.  Notably and
        pertinent to the split index feature, such a new entry doesn't
        have a reference to the original's shared index entry anymore,
        i.e. its 'index' field is set to 0.  Consequently, such an entry
        is treated by prepare_to_write_split_index() as an entry not
        present in the shared index and it will be added to the new split
        index, while the original entry will be marked as deleted, and
        neither the above discussion nor the changes in this patch apply
        to them.
    
    Signed-off-by: default avatarSZEDER Gábor <szeder.dev@gmail.com>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    e3d83798