Single verification framework(discovery)
I think we need to make some common verification framework so adding new resources won't be complex. We aim to create a verification for uploads &1817 (closed), for design repositories #32467 (closed), and for container repositories. In the future, we will add even more resources.
What if we just create a single scheduler for all king of resources? It would look like Geo::RepositoryVerification::Primary::ShardWorker
but the responsibility should be broader. I propose to create a separate scheduler for Primary and Secondary. The actual worker that calculates the checksum can only receive two arguments: resource_type and id. All existing resources can be identified by this identification system, including uploads, designs, container repositories, and even wikis and repositories. Of course, the ordering system should take into account priorities and should prevent the system from clogging when one important resource takes all the capacity.
I created this issue after brainstorming #32467 (closed), I noticed that adding new resource is incredibly hard now.
PS Sorry, It seems like I saw a similar issue somewhere but I could not find it.
UPDATE: Requirements:
- ensuring a primary checksum
- running everything automatically at scale
- producing metrics
- reverifying
- configurable rate limiting
- optimizing checksumming of remotely stored files
cc @geo-team