Skip to content

Pick the newest job for processing when collapsing jobs

Sami Hiltunen requested to merge smh-collapse-jobs-pick-oldest into master

Currently Praefect picks the oldest job for processing when collapsing jobs. When a job is completed, other jobs with the same (lock_id, change_type) are also acknowledged as complete, avoiding redundant work by collapsing replication jobs that would perform the same change.

Repository generation tracking was introduced in 7bbc7cc4. As the generation numbers are propagated in replication jobs, skipping newer jobs means we are not acknowleding higher generations we've replicated. This leads to repositories being considered outdated even if they're not. To avoid this while still keeping the benefits of job collapsing, this commit changes the collapsing to prefer newest jobs for a given repo while still maintaining the queueuing order between repositories.

For the following queue of jobs:

J1/R1 -> J2/R2 -> J3/R2 -> J4/R1 -> J5/R3

Dequeuing two events should yield:

J4 -> J3

The queueing order of the repositories is maintained while for a given repository, the latest job is picked.

Additionally, this commit only acknowledges jobs older than the completed job as newer jobs might contain higher generation numbers.

Source node is not considered anymore when acknowledging a job as replicating first from Node A and then from Node B is redundant as Node B's changes overwrite Node A's.

Edited by Sami Hiltunen

Merge request reports