Recommended re-sharding strategy for gitlab and transition out of NFS
Not sure if this is the right place to discuss that, but some issues in this repo talk about data sharding #1335 (closed), and I am not sure exactly of what is the official sharding strategy for Gitlab.
From my understanding https://docs.gitlab.com/ce/administration/repository_storage_paths.html:
- Sharding is managed via repository_storage path
- administrator can create new storage path from time to time, and change the default storage path for new projects.
- re-sharding can be done using the rest API by the admin by changing repository_storage. From my tests this is working, but this is not documented in the admin guide, only in the rest API. I am not sure of the limitation or stability of this feature.
It is also not clear in the road to 1.0 blog post https://about.gitlab.com/2018/09/12/the-road-to-gitaly-1-0/ how to completly remove NFS out of the picture.
In the following image: https://about.gitlab.com/images/gitaly_arch.png , There are still NFS servers
My understanding is that block storage would be much more performant than NFS for git like IO loads, so we would like to switch from NFS servers to block storage directly on gitaly.
As per 11.4, there is no more operation that requires local drive access in rail or in sidekiq.
It it possible to use one gitaly per shard and the use the gitaly server local volume to be the data store?
Following is using same path, but different gitaly server
git_data_dirs({
'default' => { 'path' => '/data', 'gitaly_address' => 'tcp://gitalyd.internal:8075' },
'storage1' => { 'path' => '/data', 'gitaly_address' => 'tcp://gitaly1.internal:8075' },
})
Is it something that some of your customers are already using? Will the re-sharding feature of the API work in this setup?
Can this be used as a transition strategy for moving from NFS to gitaly sharding?
i.e:
git_data_dirs({
'default' => { 'path' => '/mnt/gitlab/repositories', 'gitaly_address' => 'unix:/var/opt/gitlab/gitaly/gitaly.socket' },
'storage1' => { 'path' => '/data', 'gitaly_address' => 'tcp://gitaly1.internal:8075' },
'storage2' => { 'path' => '/data', 'gitaly_address' => 'tcp://gitaly2.internal:8075' },
'storage3' => { 'path' => '/data', 'gitaly_address' => 'tcp://gitaly3.internal:8075' },
})
Then we use API to transition repositories to new shiny gitaly servers?