Periodically remove deleted documents from the Elasticsearch index
GitLab deletes documents from elasticsearch in a number of situations. In particular:
- An issue, MR, etc, is deleted
- A file is deleted from the master branch in a repository
- A project is deleted
However, these delete operations are actually soft deletions. Per https://stackoverflow.com/questions/20608417/elasticsearch-how-to-free-store-size-after-deleting-documents , freeing up space may require us to run a
forcemerge against the elasticsearch index.
I note https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html which claims to do merging automatically on a schedule. This should free up deleted space, so maybe all we need to do is alter settings around this process.
If it's not sufficient, then I'd suggest we add a background operation to GitLab that runs the forcemerge on a schedule.