Improve Elasticsearch storage by merging segments periodically
Problem
When we last did maintenance on our index we noticed that we saved around 30% on storage just by merging segments and expunging deletes using curl -XPOST $CLUSTER_URL/gitlab-production-202010260000/_forcemerge?only_expunge_deletes=true.
Solution
Elasticsearch should be periodically merging segments but the fact that we ended up with 30% storage overhead like this it seems it's not working efficiently enough. Additionally we have noted very large growth rate to our index storage which seemed a little concerning and perhaps the lack of merging is the cause for high growth.
We could look into ways to periodically force merge this ourselves (using a cron worker or somethign). That being said it's slightly risky as this is not encouraged while you are writing to the index and when adding only_expunge_deletes=true this did take around 5 hrs to complete.
We could also investigate ways to configure Elasticsearch to be slightly more aggressive with this.