Check for accumulation of unprocessed loose foreign keys deleted records
Adding a new check
This check looks for an accumulation (over 10000) of unprocessed entries in the loose_foreign_keys_deleted_records table that may result in negative performance impacts on the system in a variety of ways. This check notifies the user if there is a large accumulation and refers them to this page with a description of the problem and some possible mitigation approaches.
Verification steps for review
Because this check is only relevant on heavily-used systems it's hard to get a sufficient test case to check, but we simulated a system with a small table of loose foreign keys and manually set two of them to status=1 to test the check with a lower trigger case:
gitlabhq_production=# select * from loose_foreign_keys_deleted_records;
id | partition | primary_key_value | status | created_at | fully_qualified_table_name | consume_after | cleanup_attempts
------+-----------+-------------------+--------+-------------------------------+----------------------------+-------------------------------+------------------
2715 | 203 | 1810 | 2 | 2025-11-04 01:00:30.277415+00 | public.ci_runner_machines | 2025-11-04 01:00:30.277415+00 | 0
2717 | 203 | 1809 | 2 | 2025-11-04 01:00:30.36289+00 | public.ci_runner_machines | 2025-11-04 01:00:30.36289+00 | 0
2712 | 203 | 1304 | 2 | 2025-11-04 01:00:19.36852+00 | public.ci_runners | 2025-11-04 01:00:19.36852+00 | 0
2714 | 203 | 1307 | 2 | 2025-11-04 01:00:30.277415+00 | public.ci_runners | 2025-11-04 01:00:30.277415+00 | 0
2716 | 203 | 1306 | 2 | 2025-11-04 01:00:30.36289+00 | public.ci_runners | 2025-11-04 01:00:30.36289+00 | 0
2718 | 203 | 1780 | 2 | 2025-11-04 15:36:05.018959+00 | public.ci_runner_machines | 2025-11-04 15:36:05.018959+00 | 0
2719 | 203 | 1782 | 2 | 2025-11-04 15:36:05.018959+00 | public.ci_runner_machines | 2025-11-04 15:36:05.018959+00 | 0
2720 | 203 | 1781 | 2 | 2025-11-04 15:36:05.018959+00 | public.ci_runner_machines | 2025-11-04 15:36:05.018959+00 | 0
2713 | 203 | 1806 | 1 | 2025-11-04 01:00:19.36852+00 | public.ci_runner_machines | 2025-11-04 01:00:19.36852+00 | 0
2711 | 203 | 1779 | 1 | 2025-11-03 23:36:02.093413+00 | public.ci_runner_machines | 2025-11-03 23:36:02.093413+00 | 0
(10 rows)
command: (value=$(sudo /opt/gitlab/bin/gitlab-psql -At -c "select count(*) from loose_foreign_keys_deleted_records WHERE status = 1;") && [ $value -ge 1 ] )
spot --ssh-agent -v -p all_playbook.yml -u admin -k ~/.ssh/id_rsa -e GITLAB_VERSION:17.11.1 -n "20251103_135 - run check"
spot v1.19.1-74e1afa-2025-08-29T17:32:47Z
[diana ec2-54-153-2-60.us-west-1.compute.amazonaws.com:22] run task "20251103_135 - run check", commands: 1
[diana ec2-54-153-2-60.us-west-1.compute.amazonaws.com:22] run command "Run the check and store result"
[diana ec2-54-153-2-60.us-west-1.compute.amazonaws.com:22] > sudo /bin/sh -c /tmp/.spot-2189275476069177344/spot-script1011991653
[diana ec2-54-153-2-60.us-west-1.compute.amazonaws.com:22] > setvar JSON_DATA_20251103_135_all_diana={ "ref_url": "https://gitlab.com/gitlab-org/gitlab/-/issues/419119", "title": "Check for accumulation of loose foreign keys deleted records", "host": "ec2-54-153-2-60.us-west-1.compute.amazonaws.com", "workaround_url": "https://docs.gitlab.com/development/database/loose_foreign_keys/#accumulation-of-deleted-records", "version_started": "14.0.0", "version_fixed": null, "message": "You have a significant accumulation of unprocessed loose foreign keys deleted records.\n This can cause LooseForeignKeys::CleanupWorker timeouts and performance issues.\n Consider investigating the cleanup process and potentially running manual cleanup\n if the accumulation is severe." }
[diana ec2-54-153-2-60.us-west-1.compute.amazonaws.com:22] completed command "Run the check and store result" {script: /bin/sh -c [multiline script]} (1.179s)
[diana ec2-54-153-2-60.us-west-1.compute.amazonaws.com:22] completed task "20251103_135 - run check", commands: 1 (1.571s)
Closes #135
Author checklist
- After opening the MR:
-
Set it to the current milestone -
Ask the Maintainer from the Reviewer roulettesuggestion for review
-
Reviewer checklist
-
I followed the verification steps and confirm the functionality of the new check -
I executed the check as presented in this MR by running the generated playbook with spot - In case of unexpected/odd behavior here, verify the generated playbook to account for potential YAML parsing issues
-
-
This check does only perform read operations -
This check does not output more than necessary on stdout for the check to function -
The messageexplains what it means when this check does not pass -
The workaround_urlprovides actionable information/steps for affected users- Consider if a Knowledge Base article should exist to serve as the ideal workaround URL
-
This check is not using the Rails console/runner, or has Maintainer approval for doing so -
This check is not using a Rake task, or has Maintainer approval for doing so -
If this is a breaking change check: -
It has the corresponding xx_breaking_changestag (xx being the major release version for the change) -
The workaround_urlgoes to the entry on the https://docs.gitlab.com/update/deprecations/ page -
The ref_urlgoes to the deprecation issue linked from that entry -
The titleis the same as that entry -
The version_startedis equal to theannouncement_milestoneof the deprecation -
The version_fixedis equal to theremoval_milestoneof the deprecation
-
Edited by Diana Stanley