Skip to content

Loose Foreign Keys is falling behind while processing CI deleted records

Right now the partitions on the LFK table on ci looks like this

Partitioned table "public.loose_foreign_keys_deleted_records"
           Column           |           Type           | Collation | Nullable |                            Default                             | Storage  | Compression | Stats target | Description 
----------------------------+--------------------------+-----------+----------+----------------------------------------------------------------+----------+-------------+--------------+-------------
 id                         | bigint                   |           | not null | nextval('loose_foreign_keys_deleted_records_id_seq'::regclass) | plain    |             |              | 
 partition                  | bigint                   |           | not null | 1321                                                           | plain    |             |              | 
 primary_key_value          | bigint                   |           | not null |                                                                | plain    |             |              | 
 status                     | smallint                 |           | not null | 1                                                              | plain    |             |              | 
 created_at                 | timestamp with time zone |           | not null | now()                                                          | plain    |             |              | 
 fully_qualified_table_name | text                     |           | not null |                                                                | extended |             |              | 
 consume_after              | timestamp with time zone |           |          | now()                                                          | plain    |             |              | 
 cleanup_attempts           | smallint                 |           |          | 0                                                              | plain    |             |              | 
Partition key: LIST (partition)
Indexes:
    "loose_foreign_keys_deleted_records_pkey" PRIMARY KEY, btree (partition, id)
    "index_loose_foreign_keys_deleted_records_for_partitioned_query" btree (partition, fully_qualified_table_name, consume_after, id) WHERE status = 1
Check constraints:
    "check_1a541f3235" CHECK (char_length(fully_qualified_table_name) <= 150)
Partitions: gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1296 FOR VALUES IN ('1296'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1297 FOR VALUES IN ('1297'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1298 FOR VALUES IN ('1298'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1299 FOR VALUES IN ('1299'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1300 FOR VALUES IN ('1300'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1301 FOR VALUES IN ('1301'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1302 FOR VALUES IN ('1302'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1303 FOR VALUES IN ('1303'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1304 FOR VALUES IN ('1304'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1305 FOR VALUES IN ('1305'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1306 FOR VALUES IN ('1306'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1307 FOR VALUES IN ('1307'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1308 FOR VALUES IN ('1308'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1309 FOR VALUES IN ('1309'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1310 FOR VALUES IN ('1310'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1311 FOR VALUES IN ('1311'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1312 FOR VALUES IN ('1312'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1313 FOR VALUES IN ('1313'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1314 FOR VALUES IN ('1314'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1315 FOR VALUES IN ('1315'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1316 FOR VALUES IN ('1316'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1317 FOR VALUES IN ('1317'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1318 FOR VALUES IN ('1318'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1319 FOR VALUES IN ('1319'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1320 FOR VALUES IN ('1320'),
            gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_1321 FOR VALUES IN ('1321')

The number of records are close to

gitlabhq_dblab=# select count(*), fully_qualified_table_name from loose_foreign_keys_deleted_records where status = 1 group by fully_qualified_table_name order by count desc;
   count   |  fully_qualified_table_name  
-----------+------------------------------
 116812985 | public.p_ci_builds
     13205 | public.p_ci_pipelines
       289 | public.ci_runner_machines
        47 | public.ci_runners
        44 | public.ci_pipeline_artifacts
         4 | public.ci_triggers

It's worth mentioning that we have enabled the turbo mode for CI LFK. https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/feature_flags/ops/loose_foreign_keys_turbo_mode_ci.yml. So the Worker is allowed to spend more time to process the CI records.

Proposed solution

Similar to !200333 (merged), we can create another dedicated worker only to process p_ci_pipelines and p_ci_builds. And make sure that this worker runs on the CI::ApplicationRecord database.

But we also need to add specs to make sure each parent table is processed by 1 worker only. Because workers process delete the records and mark them as processed. Context here

CC: @morefice @ahegyi

Edited by 🤖 GitLab Bot 🤖