Git repository history missing when using NFS on a high-traffic monorepo
Problem to solve
There have been reports from isolated customers that when running a high-traffic monorepo on NFS, the repository will report and old value of HEAD on master, causing rebases and other operations to create data loss.
Further details
A working theory is that Git succeeds to write a change, but NFS somehow fails to persist the change. When creating the next merge commit to master, this is done from a stale HEAD, causing previously merged work to be lost.
This sounds possibly similar to https://about.gitlab.com/blog/2018/11/14/how-we-spent-two-weeks-hunting-an-nfs-bug/
After testing we discovered that there are multiple caching features of NFS that can be disabled to prevent inconsistencies.
Proposal
Document NFS caching features that can result in inconsistencies in Git repository state under high write volumes, when using multiple hot Gitaly nodes.
lookupcache=pos,actimeo=0,noac
can be used to prevent this, but it increases IOPS significantly and reduces performance.
We can link to the Gitaly Cluster docs and remind users that NFS has been deprecated for Git storage in 13.0
Links/references
https://about.gitlab.com/blog/2018/11/14/how-we-spent-two-weeks-hunting-an-nfs-bug/