The simplest approach may be to take each existing shard and just randomly pick one. But we may want to add probabilities to each mount. For example, let's say we had 3 NFS servers. NFS1 we could assign a probability of 0%, and NFS2 and NFS3 could have 50%.
@eReGeBe Since we're already loading this data in gitlab.rb under git_data_dirs how much harder would it be to add a weighting preference like what @stanhu is asking?
@DouweM@smcgivern@rspeicher This is kind of critical for our path forward and @eReGeBe is acting as a release manager, therefore I don't think he will have too many cycles left.
I'm a bit confused about how the UX for this would look. I guess we could replace the existing dropdown with checkboxes, and randomly choose a storage location from the locations selected? Adding weights seems pretty heavy on the UX for doing this quickly.
@northrup If I'm reading this correctly, the "Storage path for new projects" dropdown would be replaced by a way to configure weights (probabilities) for each path, so the UI would need to become somewhat more complex. Listing the paths with checkboxes, like @smcgivern suggests, seems like it would be the easiest way.
If we want to add probabilities, we could list the paths with numeric fields that would be used to specify the weight of that path. @stanhu's example would have NFS1 as 0, NFS2 as 1 and NSF2 as 1. To get a 0%/33%/67% split, we would set 0/1/2.
@pcarranza@northrup I don’t understand how we could ship this feature without also updating the UI, since the current UI with the dropdown wouldn’t work anymore if we changed the behavior, and we can’t release this in a patch release / rc if the UI doesn’t work.
Setting this at config time in files also seems inconsistent with the previous decision to have the default storage location configurable in the UI. For a feature like this, I don’t really care whether the location weights are set in the UI or the config file, but we’ve historically preferred doing anything that can go in the UI, in the UI.
I may be missing something, but I don't see how we could just ship the behavior, and not the connection to the UI.
@DouweM I think you are right here. I think @pcarranza is saying that we want just want this capability and anything we can do to short-circuit the process (e.g. simplifying the UI) would be fine.
I'm good just showing a list of shards and percentages for each as they are setup in the configuration files, I don't need a fancy configuration UI (not now)
@northrup@pcarranza this isn't just stored in gitlab.rb, the currently-selected storage path is stored in the ApplicationSetting table. Storing a list of percentages in gitlab.rb also implies changing the data structure in there, or adding a separate key for the weights, both of which are weird to me and I think the way they'd interact with the value in the DB is going to be confusing no matter what we do.
Basically, I think the quickest way to do this is allow multiple storage locations to be chosen with equal weight. Do we really need the weighting right now?
This is going in the GitLab product, it needs to be consistent even if we're the only ones who use it right away. Eventually someone else is going to want to use this and we need to make sure we can clearly explain how it works 🙂
The simplest option would be to not do weighting just yet, and add a <select> with multiple, so that the user can pick 1 or more storage locations to randomly have one be selected from.
@DouweM sure thing! One note - if the current path is stored in application_settings then storing multiple paths is going to involve either some sort of serialisation, or a data migration.
The database column we have is:
t.string"repository_storage",default: "default"
UNIX standard would be a pathspec, like /foo:/bar:/baz:/awk ward/foo - are we OK with this? It won't let paths with : in them be specified, but you can't put them in $PATH either, so.
@DouweM we'll need a database migration then, and I'd suggest it's no more effort to use Hash than Array, to give us somewhere to put the weights later. Wdyt?
Hang on, no we won't, interpreting a column containing YAML like: default parses to a String well enough, so as long as Rails doesn't get in the way it just needs slightly clever application code.
@nick.thomas Right, if we interpret the field as YAML, we can relatively easily have support for String (current), Array (new), Hash (future). It's only on ApplicationSetting, so deserialization cost is minimal.