Remove processed replication jobs to prevent table growing rapidly
Problem to solve
The Praefect database table that tracks replication jobs grows but is never cleaned up. This results in rapid growth caused by every write operation. This is a performance problem that makes read distribution slow in practice.
Further details
Once replication is done the replication event remains in the database in the completed
or dead
state.
Most of those records are meaningless and could be safely removed.
The only completed events we rely on are the latest for each repository.
They are used to define if the repo is in up-to-date state (has no unprocessed events).
The less records we have in the database the faster queries will be and the lesser resource consumption as well.
Proposal
Remove jobs when they are no longer needed to prevent the read distribution query becoming slow
This could be done in a number of ways:
- directly delete jobs when they are successfully
completed
ordead
, at least forupdate
change type jobs - create a background task that will be executed periodically and will remove those unused records from the database.
- move them into another table if we are not ready to delete them