Skip to content

Handle version conflict errors in ElasticDeleteProjectWorker

What does this MR do and why?

Handles Elasticsearch::Transport::Transport::Errors::Conflict errors in ElasticDeleteProjectWorker which contributes 26% of Global Search's error budget.

Logs

The failure occurs when multiple processes try to change a document in Elasticsearch at the same time. Elastic employs a version per document precisely to check that it hasn't changed before actioning another change.

https://www.elastic.co/guide/en/elasticsearch/reference/8.6/optimistic-concurrency-control.html

This happens most often in remove_children_documents delete_by_query (11607 out of 11675 times in the last 7 days).

The fix is to rescue the error and re-enqueue the worker with the same args with a delay so that when it tries again, the version is resolved.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

  1. Checkout master
  2. Run ElasticDeleteProjectWorker for a project at the same time using threads:
    project = Project.find(some-id)
    thread1 = Thread.new { ElasticDeleteProjectWorker.new.perform(project.id, project.es_id) }
    thread2 = Thread.new { ElasticDeleteProjectWorker.new.perform(project.id, project.es_id) }
    thread1.join
    thread2.join
  3. Note that it results in Elasticsearch::Transport::Transport::Errors::Conflict errors
  4. Checkout this branch
  5. Run the threads again
  6. Note that it doesn't result in an error and check that all the documents have been successfully removed from elasticsearch

Related to #442823 (closed)

Edited by Madelein van Niekerk

Merge request reports