Skip to content

Geo: Bandaid for Registry rows stuck in sync state Queued

From #419370 (comment 1596281181):

I think this may be where we are setting registry record state to pending but not clearing last_synced_at. I think it doesn't always trigger the issue since that line is immediately followed by sync_repository, which quickly moves the state to started.

I suspect that this issue only occurs when the lease is taken since the service exits without moving state to started. So it is mostly an issue for frequently mutated resources.

Possible bandaid: Clear last_synced_at when setting state to pending.

Backport the fix to 16.3 and 16.4.

Workaround to unstick any permanently Queued items

On a Puma, Sidekiq, or Geo Log Cursor node in the secondary site:

gitlab-rails runner "Geo::ProjectRepositoryRegistry.where(state: ['0']).where('last_synced_at is not null').update_all(last_synced_at: nil)"

Use a cronjob to run this every 10 minutes, for example.

Edited by Michael Kozono