Instead of configuring gitaly['configuration'][:storage] uniquely for each Gitaly node, it is often easier to have include the configuration for all Gitaly nodes on every Gitaly node. You can do this because the Praefect virtual_storage configuration maps each storage name (such as gitaly-1) to a specific node, and requests are routed accordingly. This means every Gitaly node in your fleet can share the same configuration.
We're recommending naming multiple storages in Gitaly with different names and pointing them to the same storage. While this generally works, it's not really a correct configuration. Each storage should point to a unique directory. It's surprising that multiple storages may point to the same directory.
As things are, this would cause issues with Gitalys upcoming storage changes. For example, Gitaly would attempt to open the embedded database on the storage multiple times.
Gitaly with Raft will use all configured storages. Each storage will be allocated a unique ID, and will be advertised to the other nodes in the cluster. Having multiple storages pointing to the same directory would cause conflicts.
This also seems to cause issues in Gitaly already as things are. For example, our background repository maintenance routine doesn't seemingly differentiate between the duplicate storages, and schedule maintenance multiple times on the same data due to that.
It could be possible to workaround this by only taking the distinct set of storage paths and working on those internally, and routing the calls for the duplicated storages to the same storage. It does seem a bit unnecessary though, and we shouldn't be recommending such configurations to begin with.
Let's remove the recommendation from the documentation, and deprecate configuring the same storage directory with multiple names. This will require customers to update their configuration. As this is a breaking change, we can probably only enforce this in %17.0, and if transactions are opted into prior to that.
/cc @andrashorvath@mjwood I think this is something we should ideally do soonish. We're currently recommending configuring Gitaly in a manner which is not entirely correct. Adding this to &10625 (closed) given this is affecting negatively the work there.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Naive question, while we're cleaning this up, would it be possible to ignore duplicates? (Or maybe it isn't because we can't tell which one is the good one?)
The open question though is whether it's in fact useful to have the same config on all the nodes, and whether we intend to keep supporting that (by changing its structure for instance). @mjwood@gerardo
I imagine we'll at minimum need to:
Announce the change and fix the documentation, outlining clearly what users need to do
Add startup warnings if a bad configuration is found, pointing to documentation
Naive question, while we're cleaning this up, would it be possible to ignore duplicates? (Or maybe it isn't because we can't tell which one is the good one?)
@andrashorvath yes, it should be possible to take the distinct storages and only work on those internally. Gitaly itself shouldn't really care too much for the storage names. The configured storage names are however used in the API to access the correct repositories as repositories are identified with (storage_name, relative_path). One easy way would be to handle that would be to add an interceptor that rewrites the storage to whatever internal name we use for the deduplicated storage. That could be done during the deprecation period if needed.
The open question though is whether it's in fact useful to have the same config on all the nodes, and whether we intend to keep supporting that (by changing its structure for instance).
Outside of Praefect, there's no benefit to it. The main reason this is done with Praefect is so the same storage configuration can be used on every Gitaly node.
@andrashorvath@samihiltunen since we're already in the 17.0 release cycle there won't be a deprecation period, right? Don't we need to perform the breaking change right now?
@andrashorvath right, we added the notice but never updated Gitaly or Omnibus to warn, which ideally should have been done several releases ago. Now that we're at 17.0 we must add hard errors right now or wait until 18.0 to do so, is that accurate?
@andrashorvath - I'm assuming that this would be a breaking change for %17.0 that we'd want to start announcing now (%16.10). I'll schedule this to ensure we have the runway to meet the deadlines.
I want to be sure I understand the impact on a specific use case. Let's say that each Gitaly node is already configured independently (using sharded Gitaly), but is configured to manage multiple repository storages (unique to each node and corresponding disk). My understanding is that this change should not impact that configuration, so long as each repository storage points to a unique, existing directory on disk. Is that correct?
Here is a dummy example for the sake of illustrating this use case:
gitaly-1 is configured to manage these storages: default, storage1 and handbook
gitaly-2 => [storage2, engineering]
And
default, storage1, storage2, handbook and engineering are each unique storage repositories, pointing to their own directories on their respective gitaly node disk.
@troblot configuring multiple storages on a Gitaly node that each point to a unique directory will still keep working.
Configuring multiple storages that point to the same directory will not work anymore. Below is an excerpt from our setup guide for Gitaly Cluster, and it show cases the configuration which no longer will work:
# You can include the data dirs for all nodes in the same config, because# Praefect will only route requests according to the addresses provided in the# prior step.gitaly['configuration'] = { # ... storage: [ { name: 'gitaly-1', path: '/var/opt/gitlab/git-data/repositories', }, { name: 'gitaly-2', path: '/var/opt/gitlab/git-data/repositories', }, { name: 'gitaly-3', path: '/var/opt/gitlab/git-data/repositories', }, ],}
@andrashorvath - How is this coming along for %16.10? I think this is rather important as if we're deprecating functionality in %17.0, we should probably start throwing warnings soon, as well as have updated / corrected documentation.
I checked by logging in a random Gitaly node and check /etc/gitlab/gitlab.rb. From this comment of @samihiltunen , I try to find similar configuration. And I find the similar configs, where a same path /var/opt/gitlab/git-data/repositories is used for 147 storages, e.g.
From the deprecation spreadsheet that I have, this is marked as low impact. What happen if we don't change this config and upgrade Gitaly to 17.0?
Who should take care of this change? I don't think I have the domain knowledge to do it. Of course I am always open to help from my side (Delivery Group).
For future reference, the production Gitaly storage configuration referred to is here.
Does it mean gprd is impacted?
@dat.tang.gitlab yes, gprd seems to be configuring multiple storages pointing to the same directory as visible in the snippet above, and is thus affected by the deprecation.
From the deprecation spreadsheet that I have, this is marked as low impact. What happen if we don't change this config and upgrade Gitaly to 17.0?
I don't believe we've done any code changes to enforce this yet so nothing would happen as of yet.
The deprecations should be accompanied by code that enforces the removal so customers get a warning before upgrading to a version that removes support for their configuration. Once that's added, this should lead to a failure.
Who should take care of this change? I don't think I have the domain knowledge to do it. Of course I am always open to help from my side (Delivery Group).
I'm not sure exactly through which process the deprecations have been handled. Previously I've seen SREs handle the configuration changes that apply to our own services. Generally this could be handled by anyone with the knowledge of our chef setup.
@dat.tang.gitlab thank you for surfacing that this impacts .com. Dedicated may also be affected..
This is now scheduled for 17.0 so there's still a chance the code makes the cut (otherwise this is a breaking change that would potentially have to wait a major version..).
at the bot again that was supposed to add this label I'll set up another manual stopgap query to run weekly to catch these.
Who should take care of this change?
Now with gitaly having embedded SREs, we would (and anyway it's a good policy imho that the group making a breaking change has to at least endeavor to migrate everyone). -> No duplicate storage names on SaaS (#5997 - closed)
This is also impacting groupdelivery activities related to deployments and introducing risk related to it.
To add more details in this concern: If this change is merged and deployed into gprd and doesn't go well, how much time and effort do we need to revert it?
I'll defer to @samihiltunen for the difficulty of the code changes, but naively this also shouldn't be hard. I expect this to be behind a FF (perhaps default on before the release) so that it can be turned off if problematic, plus the usual release process.
We'd like to get this into %17.0 because it's blocking write-ahead log deployment which is a Disaster Recovery priority and can't wait another year. If that doesn't work, we'll pursue an exception.
In addition I'm prioritizing this issue #5454 (comment 1728417774) so that we don't make the same mistake of dropping the ball again.
@andrashorvath@samihiltunen : I don't see any code change MRs linked to this issue, so I assume none was done? When a MR is available, please link it here so we can track. Thanks.
I'll defer to @samihiltunen for the difficulty of the code changes, but naively this also shouldn't be hard. I expect this to be behind a FF (perhaps default on before the release) so that it can be turned off if problematic, plus the usual release process.
Code changes here aren't particularly involved and @wchandler already has them underway in:
@samihiltunen@wchandler I'd love to correctly understand what's referred to here as "duplicate storage path".
From our docs and my understanding of the Gitaly Cluster architecture, I'd imagine that, even though the value defined for each storage path is the same, it would still resolve to a unique, distinct location on each Gitaly node member.
So, taking the example configuration shared previously:
/var/opt/gitlab/git-data/repositories would be the location for a separate storage on each Gitaly node. I believe that's also the assumption made by most users when setting up this configuration, as shown on the recent support tickets we've been receiving regarding this change.
The mention for this breaking change on GitLab 17 changes suggest to either coalesce all storages into one, or give each declaration a unique path. This is confusing when one assumes that each storage will take place on a separate Gitaly node, which apparently is a wrong assumption?
Please shed some light here so we can better understand how to guide customers getting ready for this change
This is confusing when one assumes that each storage will take place on a separate Gitaly node, which apparently is a wrong assumption?
@dnldnz I believe this is the misunderstanding here.
This configuration snippet configures a single Gitaly node. It's configuring three storages on the Gitaly. Those storages are named gitaly-1, gitaly-2, gitaly-3 and they all point to /var/opt/gitlab/git-data/repositories.
If you have three Gitaly nodes using this configuration, there will be total of 9 storages configured.
Node-1:
gitaly-1: /var/opt/gitlab/git-data/repositories
gitaly-2: /var/opt/gitlab/git-data/repositories
gitaly-3: /var/opt/gitlab/git-data/repositories
Node-2:
gitaly-1: /var/opt/gitlab/git-data/repositories
gitaly-2: /var/opt/gitlab/git-data/repositories
gitaly-3: /var/opt/gitlab/git-data/repositories
Node-3:
gitaly-1: /var/opt/gitlab/git-data/repositories
gitaly-2: /var/opt/gitlab/git-data/repositories
gitaly-3: /var/opt/gitlab/git-data/repositories
The problem is the duplication within a single node. It's fine to have a configuration like:
Node-1:
gitaly-1: /var/opt/gitlab/git-data/repositories
Node-2:
gitaly-2: /var/opt/gitlab/git-data/repositories
Node-3:
gitaly-3: /var/opt/gitlab/git-data/repositories
Each node would have a single storage pointing to the same path and this is fine. It's not fine to have multiple storages within the same node pointing to the same path.