Remove processed replication jobs to prevent table growing rapidly

Problem to solve

The Praefect database table that tracks replication jobs grows but is never cleaned up. This results in rapid growth caused by every write operation. This is a performance problem that makes read distribution slow in practice.

Further details

Once replication is done the replication event remains in the database in the completed or dead state.
Most of those records are meaningless and could be safely removed.
The only completed events we rely on are the latest for each repository.
They are used to define if the repo is in up-to-date state (has no unprocessed events).
The less records we have in the database the faster queries will be and the lesser resource consumption as well.

Proposal

Remove jobs when they are no longer needed to prevent the read distribution query becoming slow

This could be done in a number of ways:

  • directly delete jobs when they are successfully completed or dead, at least for update change type jobs
  • create a background task that will be executed periodically and will remove those unused records from the database.
  • move them into another table if we are not ready to delete them
Edited by James Ramsay (ex-GitLab)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information