do not cache disappearing repositories
as detailed in gitlab-ce#14890 my instance of gitlab-ce (omnibus) recently had some problems with repositories on an NFS mount:
- repositories are kept on an NFS-mount (actually the entire
/var/opt/gitlab/git-data/repositories/
) - the NFS-mount was temporarily unavailable
- any repository that was accessed while the NFS-mount was down was marked as
not existing
the above is all expected behaviour, however what is not is the following:
- even after the NFS-mount was available again, the repositories marked as non-existing could not be accessed
- other repos worked
i figure this is because gitlab is caching the state of the repository, and once it remembered that the repository was non-existing
it would not check again.
This is confirmed, as flushing the entire gitlab cache solved the problem
# gitlab-rake cache:clear
Now any caching is only as good as its cache validation mechanism (and i don't think that manually flushing the cache (even via a crontab) is an option).
I therefore suggest to either improve the cache validation strategy or add markers to validate the data consistency before caching it or both.
improve the cache validation strategy
e.g. only cache existing objects (and don't cache the non-existence of objects)
check data consistency before caching
e.g. if the data partition is gone, then gitlab should refuse to work rather than assume that it should start a-new.
E.g. owncloud
marks directories as used with a special (hidden) file, and will refuse to work in the absence of a properly setup system.
(When implementing something like this, it would also be nice to be able to mark directories as not-for-use, so if they will never accidentally be used)