Geo: Registry rows stuck in sync state Queued
Problem
From #426778 (comment 1588816504):
Registry rows can get stuck in pending if they were synced before. Customers likely won't notice it, which makes the affected repos at risk of data loss in the event of a disaster.
More Details
From !117808 (comment 1446592730):
- It's possible for a registry row to be
pending
and synced before (last_synced_at
is notNULL
) - The codebase currently doesn't set a registry row which has been synced before to
pending
, but it would be reasonable for a developer or troubleshooter to do so without realizing it will get stuck in that state. RegistrySyncWorker
s sync "never attempted sync" and "needs sync again"- But both of those queries do not include rows which are
pending
and have been synced before
Proposal
There are multiple options:
- Replace usages of "never attempted sync" with
pending
butORDER BY last_synced_at ASC NULLS FIRST
. Requires a partial index on large registry tablesUSING btree (last_synced_at NULLS FIRST) WHERE (state = 0)
- Modify "needs sync again" to include rows which are
pending
and have been synced before. If PG is clever enough, we can probably add the same partial index as1.
. If not, then the index may need to be quite complex. - Modify "needs sync again" logic to perform an additional query for
pending
rows which have been synced before. Instead of one complex query. - Add a PG check constraint that enforces last_synced_at must be NULL if pending, e.g.
CONSTRAINT pending_means_never_attempted_sync CHECK (state != 0 OR last_synced_at IS NOT NULL)
. Requires a data migration to clean self-managed data first. - ??
I prefer 1. If not, then maybe 3 next.
Edited by Michael Kozono