Alerting of Loose Foreign Keys Deleted Records Processing

After identifying a recent problem with LFK Processing: #419119 (closed) We should add some alerting on LFK workers, to make sure they are working as expected, and catching up with the deleted records.

We have some Prometheus Metrics that we can use for this alerting: loose_foreign_key_updates, loose_foreign_key_deletions , loose_foreign_key_incremented_deleted_records, loose_foreign_key_rescheduled_deleted_records

Some ideas on events that we should alert on

The Goal

The goal at the end is to make sure that we don't have a big number of pending records in the loose_foreign_key_deleted_records table with status = 1. If we reach this state, we should be alerted to look into what's wrong. At the time of writing this issue, we had 29M pending records on the CI database

An example of a similar alerting MR related to another topic (Consistency Checking): gitlab-com/runbooks!5646 (merged)

References

Edited by Omar Qunsul