Vulnerability report fails to load for large groups/projects

Summary

The group-level and project-level dashboards are both failing with an internal server error.

image

The instance query works:

image

Steps to reproduce

Use graphql explorer to run the queries above.

Example Project

gitlab-org/gitlab

What is the current bug behavior?

500 error

What is the expected correct behavior?

Response with data.

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

[5:21 PM] I think the problem is, it’s trying to load all the vulnerability_occurrences records for all the scanners into the memory and then doing the distinct on them. Since we have lots of records in the vulnerability_occurrences table, it’s causing the read amplification.

[5:21 PM] The original query is running around ~350ms on postgres.ai but the one with the lateral join is running around ~13ms.

SELECT 
  DISTINCT ON ("vulnerability_scanners"."external_id", "report_types"."report_type")
  "vulnerability_scanners".*,
  report_types.report_type
FROM "vulnerability_scanners"
INNER JOIN LATERAL (SELECT report_type FROM vulnerability_occurrences WHERE vulnerability_occurrences.scanner_id = vulnerability_scanners.id LIMIT 1) AS report_types ON TRUE
WHERE 
  "vulnerability_scanners"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 9970) UNION (SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces") AND "projects"."archived" = FALSE AND "projects"."pending_delete" = FALSE)
ORDER BY 
  "vulnerability_scanners"."external_id" ASC;

/cc @mparuszewski @dftian @matt_wilson @lkerr