Zoekt Newly created indices are instantly evicted
What does this MR do and why?
The problem we trying to solve is that the newly created zoekt_index could get evicted without trying to bump the reserved_storage_bytes. The following are the two scenarios:
For the empty namespaces
This is happening here, required_storage_bytes is 0 from plan. In the ProvisioningService we are setting the reserved_storage_bytes to 1 kilobyte if the required_bytes = 0 https://gitlab.com/gitlab-org/gitlab/-/blob/bddaa8ee11f219638ed7d701351c914b73f31e48/ee/app/services/search/zoekt/provisioning_service.rb#L81
So, an index will be created with these attributes:
reserved_storage_bytes: 1024,
used_storage_bytes: 0
Now this worker UpdateIndexUsedStorageBytesEventWorker will call update_storage_bytes! which will update used_storage_bytes to DEFAULT_USED_STORAGE_BYTES(1.kilobyte) but it will skip to call refresh_reserved_storage_bytes because if condition will fail. Now both used_storage_bytes and reserved_storage_bytes will become 1.kilobyte. Therefore watermark_level will be set to critical_watermark_exceeded in before_save callback. Thus evicting this index.
For non-empty namespaces
This is more like an edge case.
The index gets created with a non-zero ideal reserved_storage_bytes and used_storage_bytes with DEFAULT_USED_STORAGE_BYTES(1.kilobyte). Now UpdateIndexUsedStorageBytesEventWorker will start updating used_storage_bytes by summing the size_bytes from zoekt_repositories. Now the edge case is, if new big zoekt_repositories got added or existing zoekt_repositories size got changed before the index gets ready. In this case, we are continuously increasing the used_storage_bytes but skipping the update of the reserved_storage_bytes because of this condition. The reserved_storage_bytes is already set according to the namespace size at the time of rollout. But during the rollout, the namespace got increased. In the before_save callback there is a chance that the storage_percent_used will make the index critical_watermark_exceeded. Thus evicting the index even before moving to ready.
The fix is to
-
Move the watermark_level setting from
after_savecallback toupdate_storage_bytesafter the call ofrefresh_used_storage_bytesandrefresh_reserved_storage_bytes. This ensures thatused_storage_bytesandreserved_storage_bytesare up-to-date before setting the watermark level. -
Inside the
refresh_reserved_storage_bytes, don't allow the reduction ofreserved_storage_bytesif the index is notreadybut allows the expansion. -
Other changes I have made in this MR are to move some
publicmethods toprivate. -
Refactor the specs
-
Rename
update_storage_bytestoupdate_storage_bytes_and_watermark_level! -
Remove the
DEFAULT_RESERVED_STORAGE_BYTES. It is not needed. This constant was used wheneverreserved_storage_byteswas0. With the new logic,reserved_storage_bytescan only be0, when an index is created for an empty namespace with0reserved_storage_bytes.UpdateIndexUsedStorageBytesEventWorkerworker will update theused_storage_bytesto1.kilobyte.refresh_reserved_storage_byteswill also try to bump thereserved_storage_bytesbut the node has0unclaimed_storage_bytes. In this casereserved_storage_byteswill remain0. But I think this is fine. In the next stepwatermark_levelwill be set tocritical_watermark_leveland eventually will be evicted.
References
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #521972 (closed)