Remove duplicated container scanning findings
What does this MR do?
This MR removes rows that are created due to corrupt data. Please refer issues above for details
Fix migration that we ran couldn't update these entities because moment we updated location_fingerprint it violates unique index (project_id, primary_identifier_id, location_fingerprint, scanner_id)
So we need to go through these records encode location_fingeprint
column property check rest of the records and delete recently created entity.
Queries
SELECT id, project_id, primary_identifier_id, location_fingerprint, scanner_id
FROM "vulnerability_occurrences"
WHERE "vulnerability_occurrences"."id" BETWEEN 231411 AND 2312411 AND "vulnerability_occurrences"."report_type" = 2 AND (length(location_fingerprint) = 40)
https://explain.depesz.com/s/b2vQ
ime: 17.265 ms
- planning: 0.379 ms
- execution: 16.886 ms
- I/O read: 1.202 ms
- I/O write: 0.000 ms
UPDATE "vulnerability_occurrences"
SET "location_fingerprint" = '\x43dcc8b28550993b1f8c1ca2d09d324e26bf5ac4'
WHERE "vulnerability_occurrences"."id" = 239934
https://explain.depesz.com/s/iCE6O
SELECT "vulnerability_occurrences".*
FROM "vulnerability_occurrences"
WHERE "vulnerability_occurrences"."report_type" = 2
AND "vulnerability_occurrences"."project_id" = 224302
AND "vulnerability_occurrences"."primary_identifier_id" = 6309
AND "vulnerability_occurrences"."scanner_id" = 1643
AND "vulnerability_occurrences"."location_fingerprint" = '\x6c871440eb9f7618b9aef25e5246acddff6ed7a1'
AND (
created_at > '2020-09-01 14:12:50.453892')
https://explain.depesz.com/s/GEps
Time: 9.101 ms
- planning: 0.373 ms
- execution: 8.728 ms
- I/O read: 8.658 ms
- I/O write: 0.000 ms
explain DELETE FROM "vulnerabilities" WHERE "vulnerabilities"."id" = 1000002
https://explain.depesz.com/s/I5J
Time: 175.848 ms
- planning: 0.234 ms
- execution: 175.614 ms
- I/O read: 12.622 ms
- I/O write: 0.000 ms
explain DELETE FROM "vulnerability_occurrences" WHERE "vulnerability_occurrences"."id" = 1
https://explain.depesz.com/s/Pcn2
Time: 668.960 ms
- planning: 0.267 ms
- execution: 668.693 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
explain DELETE FROM "notes" WHERE "notes"."project_id" = 278964 AND "notes"."noteable_type" = 'Vulnerability' AND "notes"."noteable_id" = 1436971
https://explain.depesz.com/s/QQdC
Time: 0.298 ms
- planning: 0.200 ms
- execution: 0.098 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Related cascade deletes
ALTER TABLE ONLY public.vulnerability_occurrences
ADD CONSTRAINT fk_97ffe77653 FOREIGN KEY (vulnerability_id) REFERENCES public.vulnerabilities(id) ON DELETE SET NULL;
----
ALTER TABLE ONLY public.vulnerability_occurrence_identifiers
ADD CONSTRAINT fk_rails_e4ef6d027c FOREIGN KEY (occurrence_id) REFERENCES public.vulnerability_occurrences(id) ON DELETE CASCADE;
ALTER TABLE ONLY public.vulnerability_occurrence_pipelines
ADD CONSTRAINT fk_rails_dc3ae04693 FOREIGN KEY (occurrence_id) REFERENCES public.vulnerability_occurrences(id) ON DELETE CASCADE;
----
ALTER TABLE ONLY public.vulnerability_user_mentions
ADD CONSTRAINT fk_rails_1a41c485cd FOREIGN KEY (vulnerability_id) REFERENCES public.vulnerabilities(id) ON DELETE CASCADE;
ALTER TABLE ONLY public.vulnerability_issue_links
ADD CONSTRAINT fk_rails_d459c19036 FOREIGN KEY (vulnerability_id) REFERENCES public.vulnerabilities(id) ON DELETE CASCADE;
Estimated times
-
49 ms for select statement with 1000 items
-
32 ms each update statement
-
850 ms delete statements
-
9 ms lookup
-
23_893 /1000 = 24 loop
-
1000 * 900ms = 900_000 ms each loops update time
-
900_000 * 24 = 20_700 (~345 min)
-
48 (waiting min) + 345 = 393 min