Zoekt indices stuck in evicted on STG

What does this MR do and why?

It is hard to understand how, on staging, there are indices with an evicted state and still having zoekt_replica_id. The logic that moves the indices to evicted and deletes the associated zoekt_replica is wrapped inside a transaction which increases the data integrity.

indices = Search::Zoekt::Index.pending_eviction.ordered.limit(BATCH_SIZE)
return unless indices.exists?

log_metadata = {}

ApplicationRecord.transaction do
  deleted_count = Search::Zoekt::Replica.id_in(indices.select(:zoekt_replica_id)).delete_all
  updated_count = indices.update_all(state: :evicted)

  log_metadata[:replicas_deleted_count] = deleted_count
  log_metadata[:indices_updated_count] = updated_count
end

The only reason I can think of is some kind of race condition between these two lines

  • Search::Zoekt::Replica.id_in(indices.select(:zoekt_replica_id)).delete_all
  • indices.update_all(state: :evicted)

Since rails fire the SQL only when the scope is called. So when the first line was called indices executed the SQL with Search::Zoekt::Index.pending_eviction.ordered.limit(BATCH_SIZE) and deleted the required replicas. In the next line when indices is called again it fires the SQL query Search::Zoekt::Index.pending_eviction.ordered.limit(BATCH_SIZE) but this time a new pending_eviction record is added in the scope. This index is moved to evicted without deleting its zoekt_replica.

The fix is to first save in a local variable zoekt_replica_id_to_be_deleted all the zoekt_replica_id which will be deleted. Update the zoekt_indices to evicted only for those indices which belong to zoekt_replica_id_to_be_deleted. And finally delete all the replicas with zoekt_replica_id_to_be_deleted.

Also added a migration to fix delete the zoekt_replica of all evicted zoekt_indices.

References

Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Related to #523636 (closed)

Edited by Ravi Kumar

Merge request reports

Loading