Skip to content

LooseForeignKeys::CleanupWorker is falling behind while processing CI LFK_deleted_records

At the time of writing this issue, Loose Foreign Keys Cleanup Worker still has around 25M records of deleted records to process on the ci database. Most of them are caused by deletions on the ci_builds

gitlabhq_dblab=# select count(*), fully_qualified_table_name from loose_foreign_keys_deleted_records where status = 1 group by fully_qualified_table_name; 
  count   | fully_qualified_table_name 
----------+----------------------------
     4523 | public.ci_pipelines
       13 | public.ci_runner_machines
  1292217 | public.ci_runners
 23951466 | public.p_ci_builds
(4 rows)

The worker is always hitting the time limit which is 30 seconds. See logs (internal)

As a result, the loose_foreign_keys_deleted_records, which is a partitioned table, has so many partitions that cannot be deleted

Click to expand
gitlabhq_dblab=# \d+ loose_foreign_keys_deleted_records;
                                                             Partitioned table "public.loose_foreign_keys_deleted_records"
           Column           |           Type           | Collation | Nullable |                            Default                             | Storage  | Stats target | Description 
----------------------------+--------------------------+-----------+----------+----------------------------------------------------------------+----------+--------------+-------------
 id                         | bigint                   |           | not null | nextval('loose_foreign_keys_deleted_records_id_seq'::regclass) | plain    |              | 
 partition                  | bigint                   |           | not null | 561                                                            | plain    |              | 
 primary_key_value          | bigint                   |           | not null |                                                                | plain    |              | 
 status                     | smallint                 |           | not null | 1                                                              | plain    |              | 
 created_at                 | timestamp with time zone |           | not null | now()                                                          | plain    |              | 
 fully_qualified_table_name | text                     |           | not null |                                                                | extended |              | 
 consume_after              | timestamp with time zone |           |          | now()                                                          | plain    |              | 
 cleanup_attempts           | smallint                 |           |          | 0                                                              | plain    |              | 
Partition key: LIST (partition)
Indexes:
    "loose_foreign_keys_deleted_records_pkey" PRIMARY KEY, btree (partition, id)
    "index_loose_foreign_keys_deleted_records_for_partitioned_query" btree (partition, fully_qualified_table_name, consume_after, id) WHERE status = 1
Check constraints:
    "check_1a541f3235" CHECK (char_length(fully_qualified_table_name) <= 150)
Partitions: gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_516 FOR VALUES IN ('516'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_517 FOR VALUES IN ('517'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_518 FOR VALUES IN ('518'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_519 FOR VALUES IN ('519'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_520 FOR VALUES IN ('520'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_521 FOR VALUES IN ('521'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_522 FOR VALUES IN ('522'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_523 FOR VALUES IN ('523'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_524 FOR VALUES IN ('524'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_525 FOR VALUES IN ('525'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_526 FOR VALUES IN ('526'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_527 FOR VALUES IN ('527'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_528 FOR VALUES IN ('528'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_529 FOR VALUES IN ('529'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_530 FOR VALUES IN ('530'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_531 FOR VALUES IN ('531'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_532 FOR VALUES IN ('532'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_533 FOR VALUES IN ('533'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_534 FOR VALUES IN ('534'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_535 FOR VALUES IN ('535'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_536 FOR VALUES IN ('536'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_537 FOR VALUES IN ('537'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_538 FOR VALUES IN ('538'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_539 FOR VALUES IN ('539'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_540 FOR VALUES IN ('540'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_541 FOR VALUES IN ('541'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_542 FOR VALUES IN ('542'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_543 FOR VALUES IN ('543'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_544 FOR VALUES IN ('544'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_545 FOR VALUES IN ('545'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_546 FOR VALUES IN ('546'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_547 FOR VALUES IN ('547'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_548 FOR VALUES IN ('548'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_549 FOR VALUES IN ('549'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_550 FOR VALUES IN ('550'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_551 FOR VALUES IN ('551'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_552 FOR VALUES IN ('552'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_553 FOR VALUES IN ('553'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_554 FOR VALUES IN ('554'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_555 FOR VALUES IN ('555'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_556 FOR VALUES IN ('556'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_557 FOR VALUES IN ('557'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_558 FOR VALUES IN ('558'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_559 FOR VALUES IN ('559'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_560 FOR VALUES IN ('560'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_561 FOR VALUES IN ('561')

After some analysis, see this comment it turned out that p_ci_builds has no LFK definitions, which is causing the problem

Suggested Corrective Actions, which can be split into their own issues:

  1. Giving higher chance for CI in its share of the Worker which runs every minute: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/workers/loose_foreign_keys/cleanup_worker.rb#L39-43
  2. [Temporarily] tweaking the limit: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/loose_foreign_keys/modification_tracker.rb#L5-7
  3. Maybe we write down some Background Migrations that do extra cleanups, especially when the load on the database is low.
  4. Parallelize the worker: #378586
  5. Monitoring and Alerting 🚨 We have metrics with so much information loose_foreign_key_updates, loose_foreign_key_deletions and loose_foreign_key_rescheduled_deleted_records

References:

CC: @ahegyi

Edited by Marius Bobin