Create generic Elasticsearch delete worker

Background

There are multiple delete workers used to cleanup Elasticsearch data. There are problems when indexes use non-project routing. When projects are transferred, new records get created under the new routing but the records with old routing get accidentally left behind in the index. This results in duplicate records that are hard to detect and require Advanced search migration work to remedy

As more document types are indexed into Elasticsearch, there is potential for other routing strategies to be introduced.

Current worker/delete code:

Delete workflows

Event
Project transfer to another group within same root namespace
Project transfer to another root namespace
Group transfer to another group within same root namespace
Group transfer to another root namespace
Project deleted
Group deleted

Proposal

Replace all existing removal workers with a generic Elastic removal worker to cover removing all data.

Use the traversal_ids field and a prefix query to remove data when the top level root namespace changes

When groups or projects are transferred, the process should:

  • Queue the group or project for backfill
  • Queue a delete worker that is safe to run before/after the backfill completes
Edited by Terri Chu