Zoekt::Node state: :offline -> :lost
Problem to solve
Zoekt nodes can become offline and then are still in the database. This could delay namespace re-assignment if the namespace or projects do not have any active indexing operations that would put indices or repository records into failed/reassignment state
Proposal
GitLab should detect when a Zoekt Node has not communicated back with rails for 12 hours (to start)
- create a new scope in zoekt node model to find nodes which have not communicated in 12 hours
- create a new task in the
SchedulingService
and emit an event:Search::Zoekt::NodeLostEvent
- something to indicate that the node has been offline for a LONG time.- Should not emit events if indexing is paused or indexing is disabled
- Emit for 1 node id at a time
- Run task every 10 minutes
- create a worker to handle the event for the node. the worker should not allow a second to be run, need a deduplication strategy set to
until_executed
- lock the row for the node that is being deleted
- delete all zoekt tasks in batches
- delete all zoekt repositories in batches
- delete all zoekt indices in batches
- delete the node record itself
Edited by Dmitry Gruzd