Geo secondary repository verification gets stuck after 1000 failed repositories

Doing some testing on the Geo secondary repository verification, I noticed that the same failed repositories were constantly getting re-verified over and over.

To get the repositories to verify, we grab a list of registries that have either repository_verification_checksum IS NULL or wiki_verification_checksum IS NULL, and limit it to a batch size of 1000.

https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/app/finders/geo/project_registry_finder.rb#L188-203

    # Find all registries that repository or wiki need verification
    # @return [ActiveRecord::Relation<Geo::ProjectRegistry>] list of registries that need verification
    def fdw_find_registries_to_verify(batch_size:)
      Geo::ProjectRegistry
        .joins(fdw_inner_join_repository_state)
        .where(
          local_registry_table[:repository_verification_checksum].eq(nil).or(
            local_registry_table[:wiki_verification_checksum].eq(nil)
          )
        )
        .where(
          fdw_repository_state_table[:repository_verification_checksum].not_eq(nil).or(
            fdw_repository_state_table[:wiki_verification_checksum].not_eq(nil)
          )
        ).limit(batch_size)
    end

Verification failures leave the checksum as NULL. This means that once we have 1000 failed repositories, we'll always query the same 1000 failed repositories, never moving forward.

https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/app/workers/geo/repository_verification/secondary/scheduler_worker.rb#L21-34

        def load_pending_resources
          finder.find_registries_to_verify(batch_size: db_retrieve_batch_size)
                .pluck(:id)
        end

        def schedule_job(registry_id)
          job_id = Geo::RepositoryVerification::Secondary::SingleWorker.perform_async(registry_id)

          { id: registry_id, job_id: job_id } if job_id
        end

        def finder
          @finder ||= Geo::ProjectRegistryFinder.new
        end

This was one of the original reasons for the last_verification_at dates, so what we could ensure we didn't keep pulling the same records.

/cc @dbalexandre @nick.thomas @stanhu