Skip to content

Drop Vulnerabilities without `finding_id`

What does this MR do and why?

tl;dr rows in vulnerabilities without finding_id indicate invalid data and we need to remove them

When working on Proper 1:1 relationship between Vulnerabilities... (&11030 - closed) we had to backfill the finding_id data based on our vulnerability_ocurrences table. After the process finished we realized we have exactly 6300 row for which finding_id could not be populated. This indicates invalid data and we need to remove those rows before validating the finding_id foreign key. The latest 10 invalid records were created back in April 2022 so I'm sure we're no longer creating invalid records

SQL
gitlabhq_production=> SELECT created_at FROM vulnerabilities WHERE finding_id IS NULL ORDER BY created_at DESC LIMIT 10;
          created_at           
-------------------------------
 2022-04-20 20:34:08.328467+00
 2022-04-20 20:33:43.124563+00
 2022-04-20 20:33:41.319695+00
 2022-04-20 20:33:33.750897+00
 2022-04-20 20:33:06.104165+00
 2022-04-20 20:32:21.725177+00
 2022-04-20 20:32:14.516929+00
 2022-04-20 20:32:07.394365+00
 2022-04-20 20:31:40.921779+00
 2022-04-20 20:31:38.651027+00

To paint a complete picture – we actually have far more orphaned rows in vulnerability_occurrences table:

SQL
gitlabhq_production=> SELECT COUNT(*) FROM vulnerability_occurrences WHERE vulnerability_id IS NULL;
  count  
---------
 1305804
(1 row)

However it's okay to keep them there for now. We will drop them once the migration to 1:1 relationship is complete.

Related to #436418 (closed)

Edited by Michał Zając

Merge request reports