Skip to content

Remove duplicated container scanning findings

Can Eldem requested to merge remove-duplicated-cs-findings into master

What does this MR do?

This MR removes rows that are created due to corrupt data. Please refer issues above for details

Fix migration that we ran couldn't update these entities because moment we updated location_fingerprint it violates unique index (project_id, primary_identifier_id, location_fingerprint, scanner_id)

So we need to go through these records encode location_fingeprint column property check rest of the records and delete recently created entity.

Queries


SELECT id, project_id, primary_identifier_id, location_fingerprint, scanner_id
FROM "vulnerability_occurrences"
WHERE "vulnerability_occurrences"."id" BETWEEN 231411 AND 2312411 AND "vulnerability_occurrences"."report_type" = 2 AND (length(location_fingerprint) = 40)

https://explain.depesz.com/s/b2vQ

ime: 17.265 ms
  - planning: 0.379 ms
  - execution: 16.886 ms
    - I/O read: 1.202 ms
    - I/O write: 0.000 ms
UPDATE "vulnerability_occurrences" 
SET    "location_fingerprint" = '\x43dcc8b28550993b1f8c1ca2d09d324e26bf5ac4' 
WHERE  "vulnerability_occurrences"."id" = 239934 

https://explain.depesz.com/s/iCE6O

SELECT "vulnerability_occurrences".* 
FROM   "vulnerability_occurrences" 
WHERE  "vulnerability_occurrences"."report_type" = 2 
AND    "vulnerability_occurrences"."project_id" = 224302 
AND    "vulnerability_occurrences"."primary_identifier_id" = 6309 
AND    "vulnerability_occurrences"."scanner_id" = 1643 
AND    "vulnerability_occurrences"."location_fingerprint" = '\x6c871440eb9f7618b9aef25e5246acddff6ed7a1'
AND    ( 
              created_at > '2020-09-01 14:12:50.453892')

https://explain.depesz.com/s/GEps

Time: 9.101 ms
  - planning: 0.373 ms
  - execution: 8.728 ms
    - I/O read: 8.658 ms
    - I/O write: 0.000 ms
explain  DELETE FROM "vulnerabilities" WHERE "vulnerabilities"."id" = 1000002

https://explain.depesz.com/s/I5J

Time: 175.848 ms
  - planning: 0.234 ms
  - execution: 175.614 ms
    - I/O read: 12.622 ms
    - I/O write: 0.000 ms
explain DELETE FROM "vulnerability_occurrences" WHERE "vulnerability_occurrences"."id" = 1

https://explain.depesz.com/s/Pcn2

Time: 668.960 ms
  - planning: 0.267 ms
  - execution: 668.693 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms
explain DELETE FROM "notes" WHERE "notes"."project_id" = 278964 AND "notes"."noteable_type" = 'Vulnerability' AND "notes"."noteable_id" = 1436971

https://explain.depesz.com/s/QQdC

Time: 0.298 ms
  - planning: 0.200 ms
  - execution: 0.098 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Related cascade deletes


ALTER TABLE ONLY public.vulnerability_occurrences
    ADD CONSTRAINT fk_97ffe77653 FOREIGN KEY (vulnerability_id) REFERENCES public.vulnerabilities(id) ON DELETE SET NULL;

---- 
ALTER TABLE ONLY public.vulnerability_occurrence_identifiers
    ADD CONSTRAINT fk_rails_e4ef6d027c FOREIGN KEY (occurrence_id) REFERENCES public.vulnerability_occurrences(id) ON DELETE CASCADE;


ALTER TABLE ONLY public.vulnerability_occurrence_pipelines
    ADD CONSTRAINT fk_rails_dc3ae04693 FOREIGN KEY (occurrence_id) REFERENCES public.vulnerability_occurrences(id) ON DELETE CASCADE;

----

ALTER TABLE ONLY public.vulnerability_user_mentions
    ADD CONSTRAINT fk_rails_1a41c485cd FOREIGN KEY (vulnerability_id) REFERENCES public.vulnerabilities(id) ON DELETE CASCADE;

ALTER TABLE ONLY public.vulnerability_issue_links
    ADD CONSTRAINT fk_rails_d459c19036 FOREIGN KEY (vulnerability_id) REFERENCES public.vulnerabilities(id) ON DELETE CASCADE;

Estimated times

  • 49 ms for select statement with 1000 items

  • 32 ms each update statement

  • 850 ms delete statements

  • 9 ms lookup

  • 23_893 /1000 = 24 loop

  • 1000 * 900ms = 900_000 ms each loops update time

  • 900_000 * 24 = 20_700 (~345 min)

  • 48 (waiting min) + 345 = 393 min

Conformity

Edited by Can Eldem

Merge request reports