VFS for Git protocol support (MVC)
As Git repositories become larger, performance can degrade in a variety of ways. Most notably clones could take many hours for 100GB projects, and operations that read the index or check for changes become very slow due to IO requirements. VFS for Git (previously ~~GVFS~~) is an approach developed by Microsoft that uses a virtual file system on the client which allows on-demand fetching of git repository objects. Cloning a repo will only fetch the refs/metadata and the first tree object for the checked out commit using the VFS for Git API. Objects are fetched from the server on demand. to address this be reducing the amount of data that needs to be cloned, and retrieving data as needed using a virtual file system. ### Further details This approach was developed by Microsoft to allow developers to work on the Windows project (100GB packfile, 300GB files, 3.5M files) using Git. An alternative approach that does not require a virtual file system is also being worked on in mainline Git called [partial clone](https://github.com/git/git/blob/master/Documentation/technical/partial-clone.txt). There are a significant number of organizations with large centralized repositories stored in Perforce Depots that desire to move to Git, but are unable to due to the size of their repositories. **Server support** - [Microsoft TFS](https://docs.microsoft.com/en-us/visualstudio/releasenotes/tfs2018-relnotes) – built in - [Bitbucket plugin: GVFS for Bitbucket](https://marketplace.atlassian.com/apps/1217957/gvfs-for-bitbucket-server?hosting=server&tab=overview) – marketplace app that is unsupported and only available for self hosted single server configurations - Github - not available **Client support** VFS for Git is not supported by the [official Git client](https://github.com/git-for-windows/git). Microsoft has released a [custom version of Git for Windows](https://github.com/Microsoft/git/releases) with support. [VFS for Git](https://github.com/Microsoft/VFSForGit/blob/master/License.md) (user land, [MIT](https://github.com/Microsoft/VFSForGit/blob/master/License.md)) is being built on [Windows Projected File System](https://docs.microsoft.com/en-us/windows/desktop/projfs/projected-file-system) (kernel, _closed source?_). - :white\_check\_mark: Windows – [Docs](https://github.com/Microsoft/VFSForGit#installing-vfs-for-git) - :warning: Mac – active development, [Manual build steps](https://github.com/Microsoft/VFSForGit#building-vfs-for-git-on-mac) - :warning: Linux - prototype [source](https://github.com/Microsoft/VFSForGit/tree/features/linuxprototype) ### Vision GitLab should support large projects to the best possible extent, so that customers with large non-Git projects can migrate to Git and GitLab. (https://gitlab.com/groups/gitlab-org/-/epics/773) While multiple strategies for handling large repositories emerge, GitLab should offer prototype support for VFS for Git to help evaluate the strengths and weaknesses of the different approaches (VFS for Git vs partial clones), and be an active contributor in solving this problem for customers using GitLab. ### Proposal Investigations and conversations with Microsoft make it clear that the complexity of implementing GVFS on the server side is in a high performance implementation the `GET /gvfs/prefetch` route. We should implement an experimental MVC behind a **feature flag** that supports all mandatory routes, and ideally an unoptimized version of the `prefetch` route. This will allow us to test the real world implementation of a naive implementation, demonstrate a connection between a GVFS client and GitLab, and provide a baseline for future iterations that improve performance. ### Links - https://blogs.msdn.microsoft.com/visualstudioalm/2017/02/03/announcing-gvfs-git-virtual-file-system/ - https://blogs.msdn.microsoft.com/devops/2018/02/26/vststfs-roadmap-update-for-2018-q1-and-q2/ - https://github.com/Microsoft/gvfs/blob/master/Protocol.md ### Customers - https://na34.salesforce.com/0016100000NmU19 - the customer has massive multi-terrabyte monorepo in a Perforce Depot that cannot be split. - https://na34.salesforce.com/00161000004yLEy ~"needs investigation" - https://na34.salesforce.com/00161000004bZPD ~"needs investigation" - https://na34.salesforce.com/0016100000fDO7w ~"needs investigation" - https://na34.salesforce.com/00161000003RH62 ~"needs investigation" - https://na34.salesforce.com/00161000004zrG3 - the customer has a large repository where the initial clone is slow (>30 mins). Some rendering issues have been observed for a project with 27k files at depth=1. - https://na34.salesforce.com/0016100000NmU19 ~"needs investigation" - https://na34.salesforce.com/00161000006g08Q ~"needs investigation"
epic