Elasticsearch data distribution between shards is highly skewed
Summary
We have been noticing that there is a skewed data allocation among nodes in the Elasticsearch cluster for Gitlab. We see a trend where the data usage for nodes is closely a geometric progression with a factor of 2. We ae currently using a cluster with 7 nodes where the number of primary shards is 5(default is used in Gitlab) and number of replicas is 2. Due to this distribution of data most of the nodes is under utilised and 1 of them is over-utilised. We have already doubled out data storage twice because of this and we fear that if any of the large repositories is forked we would have to double it again becuase of the skewness.
Steps to reproduce
(How one can reproduce the issue - this is very important)
Example Project
(If possible, please create an example project here on GitLab.com that exhibits the problematic behaviour, and link to it here in the bug report)
(If you are using an older version of GitLab, this will also determine whether the bug has been fixed in a more recent version)
What is the current bug behavior?
(What actually happens)
What is the expected correct behavior?
(What you should see instead)
Relevant logs and/or screenshots
(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's very hard to read otherwise.)
Output of checks
(If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com)
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:env:info
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production
)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
(If you can, link to the line of code that might be responsible for the problem)