Zoekt indices stuck in evicted on STG
What does this MR do and why?
It is hard to understand how, on staging, there are indices with an evicted state and still having zoekt_replica_id. The logic that moves the indices to evicted and deletes the associated zoekt_replica is wrapped inside a transaction which increases the data integrity.
indices = Search::Zoekt::Index.pending_eviction.ordered.limit(BATCH_SIZE)
return unless indices.exists?
log_metadata = {}
ApplicationRecord.transaction do
deleted_count = Search::Zoekt::Replica.id_in(indices.select(:zoekt_replica_id)).delete_all
updated_count = indices.update_all(state: :evicted)
log_metadata[:replicas_deleted_count] = deleted_count
log_metadata[:indices_updated_count] = updated_count
end
The only reason I can think of is some kind of race condition between these two lines
Search::Zoekt::Replica.id_in(indices.select(:zoekt_replica_id)).delete_allindices.update_all(state: :evicted)
Since rails fire the SQL only when the scope is called. So when the first line was called indices executed the SQL with Search::Zoekt::Index.pending_eviction.ordered.limit(BATCH_SIZE) and deleted the required replicas. In the next line when indices is called again it fires the SQL query Search::Zoekt::Index.pending_eviction.ordered.limit(BATCH_SIZE) but this time a new pending_eviction record is added in the scope. This index is moved to evicted without deleting its zoekt_replica.
The fix is to first save in a local variable zoekt_replica_id_to_be_deleted all the zoekt_replica_id which will be deleted. Update the zoekt_indices to evicted only for those indices which belong to zoekt_replica_id_to_be_deleted. And finally delete all the replicas with zoekt_replica_id_to_be_deleted.
Also added a migration to fix delete the zoekt_replica of all evicted zoekt_indices.
References
Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
| Before | After |
|---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Related to #523636 (closed)