Skip to content

Query was duplicating results

rossfuhrman requested to merge 11460-duplicate-vulnerabilities-returned into master

What does this MR do?

This change fixes the problem where results were being duplicated on the group security dashboard.

Previous, incorrect query and query plan:

explain analyze SELECT vulnerability_occurrences.*, ci_pipelines.sha FROM "vulnerability_occurrences" INNER JOIN "vulnerability_occurrence_pipelines" ON "vulnerability_occurrence_pipelines"."occurrence_id" = "vulnerability_occurrences"."id" INNER JOIN "vulnerability_occurrence_pipelines" "occurrence_pipelines_vulnerability_occurrences_join" ON "occurrence_pipelines_vulnerability_occurrences_join"."occurrence_id" = "vulnerability_occurrences"."id" INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "occurrence_pipelines_vulnerability_occurrences_join"."pipeline_id" WHERE "vulnerability_occurrence_pipelines"."pipeline_id" IN (SELECT max(id) as id FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project' WHERE (rs.path LIKE 'gitlab-orgg%')) AND (EXISTS (SELECT 1 FROM "vulnerability_occurrence_pipelines" WHERE (ci_pipelines.id=vulnerability_occurrence_pipelines.pipeline_id))) AND "ci_pipelines"."status" = 'success' GROUP BY "ci_pipelines"."project_id")



                                                                                                                          QUERY PLAN                                                                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=10.73..39.22 rows=52 width=843) (actual time=0.157..0.157 rows=0 loops=1)
   Hash Cond: (occurrence_pipelines_vulnerability_occurrences_join.pipeline_id = ci_pipelines.id)
   ->  Nested Loop  (cost=5.46..33.23 rows=52 width=806) (actual time=0.089..0.089 rows=0 loops=1)
         ->  Nested Loop  (cost=5.19..21.71 rows=36 width=810) (actual time=0.087..0.087 rows=0 loops=1)
               ->  Nested Loop  (cost=4.91..7.91 rows=36 width=8) (actual time=0.086..0.086 rows=0 loops=1)
                     ->  HashAggregate  (cost=4.64..4.65 rows=1 width=4) (actual time=0.086..0.086 rows=0 loops=1)
                           Group Key: max(ci_pipelines_1.id)
                           ->  GroupAggregate  (cost=4.60..4.62 rows=1 width=8) (actual time=0.085..0.085 rows=0 loops=1)
                                 Group Key: ci_pipelines_1.project_id
                                 ->  Sort  (cost=4.60..4.61 rows=1 width=8) (actual time=0.085..0.085 rows=0 loops=1)
                                       Sort Key: ci_pipelines_1.project_id
                                       Sort Method: quicksort  Memory: 25kB
                                       ->  Nested Loop Semi Join  (cost=3.75..4.59 rows=1 width=8) (actual time=0.050..0.050 rows=0 loops=1)
                                             ->  Nested Loop  (cost=3.48..3.66 rows=3 width=8) (actual time=0.050..0.050 rows=0 loops=1)
                                                   Join Filter: (ci_pipelines_1.project_id = rs.source_id)
                                                   ->  HashAggregate  (cost=3.34..3.35 rows=1 width=8) (actual time=0.050..0.050 rows=0 loops=1)
                                                         Group Key: projects.id
                                                         ->  Nested Loop  (cost=0.28..3.33 rows=1 width=8) (actual time=0.049..0.049 rows=0 loops=1)
                                                               ->  Index Scan using index_routes_on_path_text_pattern_ops on routes rs  (cost=0.14..2.17 rows=1 width=4) (actual time=0.049..0.049 rows=0 loops=1)
                                                                     Index Cond: (((path)::text ~>=~ 'gitlab-orgg'::text) AND ((path)::text ~<~ 'gitlab-orgh'::text))
                                                                     Filter: (((path)::text ~~ 'gitlab-orgg%'::text) AND ((source_type)::text = 'Project'::text))
                                                               ->  Index Only Scan using idx_projects_on_repository_storage_last_repository_updated_at on projects  (cost=0.14..1.16 rows=1 width=4) (never executed)
                                                                     Index Cond: (id = rs.source_id)
                                                                     Heap Fetches: 0
                                                   ->  Index Only Scan using index_ci_pipelines_on_project_id_and_ref_and_status_and_id on ci_pipelines ci_pipelines_1  (cost=0.14..0.28 rows=3 width=8) (never executed)
                                                         Index Cond: ((project_id = projects.id) AND (status = 'success'::text))
                                                         Heap Fetches: 0
                                             ->  Index Only Scan using index_vulnerability_occurrence_pipelines_on_pipeline_id on vulnerability_occurrence_pipelines vulnerability_occurrence_pipelines_1  (cost=0.28..1.08 rows=36 width=4) (never executed)
                                                   Index Cond: (pipeline_id = ci_pipelines_1.id)
                                                   Heap Fetches: 0
                     ->  Index Scan using index_vulnerability_occurrence_pipelines_on_pipeline_id on vulnerability_occurrence_pipelines  (cost=0.28..2.91 rows=36 width=12) (never executed)
                           Index Cond: (pipeline_id = (max(ci_pipelines_1.id)))
               ->  Index Scan using vulnerability_occurrences_pkey on vulnerability_occurrences  (cost=0.28..0.37 rows=1 width=802) (never executed)
                     Index Cond: (id = vulnerability_occurrence_pipelines.occurrence_id)
         ->  Index Only Scan using vulnerability_occurrence_pipelines_on_unique_keys on vulnerability_occurrence_pipelines occurrence_pipelines_vulnerability_occurrences_join  (cost=0.28..0.31 rows=1 width=12) (never executed)
               Index Cond: (occurrence_id = vulnerability_occurrence_pipelines.occurrence_id)
               Heap Fetches: 0
   ->  Hash  (cost=4.01..4.01 rows=101 width=45) (actual time=0.055..0.055 rows=104 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 17kB
         ->  Seq Scan on ci_pipelines  (cost=0.00..4.01 rows=101 width=45) (actual time=0.013..0.033 rows=104 loops=1)
 Planning time: 6.377 ms
 Execution time: 0.391 ms
(42 rows)

Updated, correct query and query plan:

explain analyze SELECT DISTINCT vulnerability_occurrences.*, ci_pipelines.sha FROM "vulnerability_occurrences" INNER JOIN "vulnerability_occurrence_pipelines" ON "vulnerability_occurrence_pipelines"."occurrence_id" = "vulnerability_occurrences"."id" INNER JOIN "vulnerability_occurrence_pipelines" "occurrence_pipelines_vulnerability_occurrences_join" ON "occurrence_pipelines_vulnerability_occurrences_join"."occurrence_id" = "vulnerability_occurrences"."id" INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "occurrence_pipelines_vulnerability_occurrences_join"."pipeline_id" WHERE "vulnerability_occurrence_pipelines"."pipeline_id" IN (SELECT max(id) as id FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project' WHERE (rs.path LIKE 'gitlab-org/%')) AND (EXISTS (SELECT 1 FROM "vulnerability_occurrence_pipelines" WHERE (ci_pipelines.id=vulnerability_occurrence_pipelines.pipeline_id))) AND "ci_pipelines"."status" = 'success' GROUP BY "ci_pipelines"."project_id");



QUERY PLAN                                                                                                                                                                                                                                                                                                                
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=42.15..42.67 rows=52 width=843) (actual time=0.553..0.557 rows=21 loops=1)
   Group Key: vulnerability_occurrences.id, vulnerability_occurrences.created_at, vulnerability_occurrences.updated_at, vulnerability_occurrences.severity, vulnerability_occurrences.confidence, vulnerability_occurrences.report_type, vulnerability_occurrences.project_id, vulnerability_occurrences.scanner_id, vulnerability_occurrences.primary_identifier_id, vulnerability_occurrences.project_fingerprint, vulnerability_occurrences.location_fingerprint, vulnerability_occurrences.uuid, vulnerability_occurrences.name, vulnerability_occurrences.metadata_version, vulnerability_occurrences.raw_metadata, ci_pipelines.sha
   ->  Hash Join  (cost=11.59..40.07 rows=52 width=843) (actual time=0.338..0.445 rows=42 loops=1)
         Hash Cond: (occurrence_pipelines_vulnerability_occurrences_join.pipeline_id = ci_pipelines.id)
         ->  Nested Loop  (cost=6.32..34.09 rows=52 width=806) (actual time=0.271..0.368 rows=42 loops=1)
               ->  Nested Loop  (cost=6.04..22.56 rows=36 width=810) (actual time=0.266..0.322 rows=21 loops=1)
                     ->  Nested Loop  (cost=5.77..8.77 rows=36 width=8) (actual time=0.263..0.273 rows=21 loops=1)
                           ->  HashAggregate  (cost=5.49..5.50 rows=1 width=4) (actual time=0.261..0.261 rows=1 loops=1)
                                 Group Key: max(ci_pipelines_1.id)
                                 ->  GroupAggregate  (cost=5.46..5.48 rows=1 width=8) (actual time=0.259..0.259 rows=1 loops=1)
                                       Group Key: ci_pipelines_1.project_id
                                       ->  Sort  (cost=5.46..5.46 rows=1 width=8) (actual time=0.257..0.257 rows=2 loops=1)
                                             Sort Key: ci_pipelines_1.project_id
                                             Sort Method: quicksort  Memory: 25kB
                                             ->  Nested Loop Semi Join  (cost=4.61..5.45 rows=1 width=8) (actual time=0.226..0.234 rows=2 loops=1)
                                                   ->  Nested Loop  (cost=4.34..4.52 rows=3 width=8) (actual time=0.182..0.187 rows=2 loops=1)
                                                         Join Filter: (ci_pipelines_1.project_id = rs.source_id)
                                                         ->  HashAggregate  (cost=4.19..4.20 rows=1 width=8) (actual time=0.122..0.123 rows=7 loops=1)
                                                               Group Key: projects.id
                                                               ->  Nested Loop  (cost=0.28..4.19 rows=1 width=8) (actual time=0.094..0.113 rows=7 loops=1)
                                                                     ->  Index Scan using index_routes_on_path on routes rs  (cost=0.14..3.02 rows=1 width=4) (actual time=0.021..0.028 rows=7 loops=1)
                                                                           Index Cond: (((path)::text >= 'gitlab-org/'::text) AND ((path)::text < 'gitlab-org0'::text))
                                                                           Filter: (((path)::text ~~ 'gitlab-org/%'::text) AND ((source_type)::text = 'Project'::text))
                                                                     ->  Index Only Scan using idx_projects_on_repository_storage_last_repository_updated_at on projects  (cost=0.14..1.16 rows=1 width=4) (actual time=0.011..0.011 rows=1 loops=7)
                                                                           Index Cond: (id = rs.source_id)
                                                                           Heap Fetches: 7
                                                         ->  Index Only Scan using index_ci_pipelines_on_project_id_and_ref_and_status_and_id on ci_pipelines ci_pipelines_1  (cost=0.14..0.28 rows=3 width=8) (actual time=0.007..0.007 rows=0 loops=7)
                                                               Index Cond: ((project_id = projects.id) AND (status = 'success'::text))
                                                               Heap Fetches: 2
                                                   ->  Index Only Scan using index_vulnerability_occurrence_pipelines_on_pipeline_id on vulnerability_occurrence_pipelines vulnerability_occurrence_pipelines_1  (cost=0.28..1.08 rows=36 width=4) (actual time=0.022..0.022 rows=1 loops=2)
                                                         Index Cond: (pipeline_id = ci_pipelines_1.id)
                                                         Heap Fetches: 2
                           ->  Index Scan using index_vulnerability_occurrence_pipelines_on_pipeline_id on vulnerability_occurrence_pipelines  (cost=0.28..2.91 rows=36 width=12) (actual time=0.002..0.009 rows=21 loops=1)
                                 Index Cond: (pipeline_id = (max(ci_pipelines_1.id)))
                     ->  Index Scan using vulnerability_occurrences_pkey on vulnerability_occurrences  (cost=0.28..0.37 rows=1 width=802) (actual time=0.001..0.002 rows=1 loops=21)
                           Index Cond: (id = vulnerability_occurrence_pipelines.occurrence_id)
               ->  Index Only Scan using vulnerability_occurrence_pipelines_on_unique_keys on vulnerability_occurrence_pipelines occurrence_pipelines_vulnerability_occurrences_join  (cost=0.28..0.31 rows=1 width=12) (actual time=0.001..0.002 rows=2 loops=21)
                     Index Cond: (occurrence_id = vulnerability_occurrence_pipelines.occurrence_id)
                     Heap Fetches: 42
         ->  Hash  (cost=4.01..4.01 rows=101 width=45) (actual time=0.052..0.052 rows=104 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 17kB
               ->  Seq Scan on ci_pipelines  (cost=0.00..4.01 rows=101 width=45) (actual time=0.011..0.028 rows=104 loops=1)
 Planning time: 2.832 ms
 Execution time: 0.817 ms
(44 rows)

Does this MR meet the acceptance criteria?

Conformity

Performance and testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #11460 (closed)

Edited by Mayra Cabrera

Merge request reports