Git fsck parallelism not properly spread out over shards on Geo secondary
In gitlab-org/gitlab-ee!6287 we've added parallelism for git fsck
, to run 1 process per shard.
But that MR did not implement that correctly on a Geo secondary.
On the primary:
def never_checked_project_ids(batch_size)
projects_on_shard.where(last_repository_check_at: nil)
.where('created_at < ?', 24.hours.ago)
.limit(batch_size).pluck(:id)
end
on the secondary:
def never_checked_project_ids(batch_size)
return super unless ::Gitlab::Geo.secondary?
Geo::ProjectRegistry.synced_repos.synced_wikis
.where(last_repository_check_at: nil)
.where('last_repository_synced_at < ?', 24.hours.ago)
.where('last_wiki_synced_at < ?', 24.hours.ago)
.limit(batch_size).pluck(:project_id)
end
Proposed solution
I'm not sure yet. Maybe cross-query the databases with FDW?
Edited by Toon Claes