Use weighted probabilities for randomized repository selection based on disk utilization
Problem to solve
GitLab allocates new repositories to a randomly selected shard, but this doesn't consider the current storage capacity of the shard.
This means that as some storage shards become increasingly full, these have an equal chance of becoming more full.
Further details
A subset of available shards can be selected as the location for new repositories, but for GitLab.com this list is massive.
An administrator can manually monitor the nodes, and block some from receiving new projects, but it would be better if GitLab was smarter about allocating projects to empty nodes so that admins don't need to worry about this.
Screenshot
For GitLab.com, the selection list looks like this:
This means that 5 of the 47 Gitaly nodes will be selected for new repositories.
Proposal
When a new project is created and is assigned to a Gitaly shard, use weighted probabilities based on available space to prefer emptier nodes.
Using a simple example:
- There are two Gitaly shards,
file-01
, andfile-02
. -
file-01
has 100MB free,file-02
has 50MB free. - Using a weighted average,
file-01
has double the likelihood of receiving a new repository compared tofile-02
, or technically 100 in 150, asfile-02
has 50/150 probability.
Using these weighted probabilities, file-01
receives roughly 2 of out every 3 new repositories created, whereas file-02
will be allocated one in three.
Why is this better?
Currently, all new repositories are being focused on a small number of shards. This puts a great deal of pressure (particularly from things like large imports) on these nodes.
Worse, large prospects land on the same nodes as abusers.
Using the weighted probability, all, or almost all of the nodes can be selected as receiving new repositories, but the allocation rate will be the inverse of their current allocation.
This has several advantages:
- Load is spread
- New nodes can be allocated in a staggered fashion
Note that we would retain the ability to prevent certain nodes from receiving new repositories, but we would need to use it much less. New gitaly shards could be added without disabling new repositories on the last generation of nodes first.
Additionally, if repositories are deleted from older nodes, the space would be reallocated for new repositories. The likelihood of this would increase as more space is freed.