Git fsck parallelism not properly spread out over shards on Geo secondary

In gitlab-org/gitlab-ee!6287 we've added parallelism for git fsck, to run 1 process per shard.

But that MR did not implement that correctly on a Geo secondary.

On the primary:

    def never_checked_project_ids(batch_size)
      projects_on_shard.where(last_repository_check_at: nil)
        .where('created_at < ?', 24.hours.ago)
        .limit(batch_size).pluck(:id)
    end

on the secondary:

      def never_checked_project_ids(batch_size)
        return super unless ::Gitlab::Geo.secondary?

        Geo::ProjectRegistry.synced_repos.synced_wikis
          .where(last_repository_check_at: nil)
          .where('last_repository_synced_at < ?', 24.hours.ago)
          .where('last_wiki_synced_at < ?', 24.hours.ago)
          .limit(batch_size).pluck(:project_id)
      end

Proposed solution

I'm not sure yet. Maybe cross-query the databases with FDW?

Edited Aug 05, 2018 by Toon Claes
Assignee Loading
Time tracking Loading