Skip to content

Zoekt::Node state: :offline -> :lost

Problem to solve

Zoekt nodes can become offline and then are still in the database. This could delay namespace re-assignment if the namespace or projects do not have any active indexing operations that would put indices or repository records into failed/reassignment state

Proposal

GitLab should detect when a Zoekt Node has not communicated back with rails for 12 hours (to start)

  1. create a new scope in zoekt node model to find nodes which have not communicated in 12 hours
  2. create a new task in the SchedulingService and emit an event: Search::Zoekt::NodeLostEvent - something to indicate that the node has been offline for a LONG time.
    • Should not emit events if indexing is paused or indexing is disabled
    • Emit for 1 node id at a time
    • Run task every 10 minutes
  3. create a worker to handle the event for the node. the worker should not allow a second to be run, need a deduplication strategy set to until_executed
    1. lock the row for the node that is being deleted
    2. delete all zoekt tasks in batches
    3. delete all zoekt repositories in batches
    4. delete all zoekt indices in batches
    5. delete the node record itself
Edited by Dmitry Gruzd