Skip to content

Periodically remove deleted documents from the Elasticsearch index

Noted by @astrachan per https://gitlab.zendesk.com/agent/tickets/103880 (internal link)

GitLab deletes documents from elasticsearch in a number of situations. In particular:

  • An issue, MR, etc, is deleted
  • A file is deleted from the master branch in a repository
  • A project is deleted

However, these delete operations are actually soft deletions. Per https://stackoverflow.com/questions/20608417/elasticsearch-how-to-free-store-size-after-deleting-documents , freeing up space may require us to run a forcemerge against the elasticsearch index.

I note https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html which claims to do merging automatically on a schedule. This should free up deleted space, so maybe all we need to do is alter settings around this process.

If it's not sufficient, then I'd suggest we add a background operation to GitLab that runs the forcemerge on a schedule.

/cc @smcgivern @victorwu