Skip to content

Fix Secure feature API fetching group projects with security reports

Discovered in !62092 (closed) we found ee/app/finders/ee/group_projects_finder.rb joins between CI and non-CI tables when the with_security_reports is called. It looks like this finder uses the :with_security_reports scope.

It seems we have a group project API where we can filter a group's projects by whether it has with_security_reports, or not.

The query that is failing in question:

SELECT "projects".* FROM ((SELECT "projects".* FROM "projects" WHERE "projects"."namespace_id" = 8 AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 3 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (10,20)))
UNION
(SELECT "projects".* FROM "projects" INNER JOIN "project_group_links" ON "projects"."id" = "project_group_links"."project_id" WHERE "project_group_links"."group_id" = 8 AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 3 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (10,20)))) projects WHERE "projects"."pending_delete" = FALSE AND (EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" IN (5, 6, 7, 8, 21, 23, 26, 27) AND (ci_job_artifacts.project_id = projects.id))) ORDER BY "projects"."id" DESC

Failing spec

https://gitlab.com/gitlab-org/gitlab/-/jobs/1440069680

Options

  1. Introduce a new table to store whether or not a project has security reports (de-normalization) and UPSERT into this table whenever we create security reports. Could be called projects_with_security_reports and only needs to contain the project_id for this use case.
  2. Change this sub-select to 3 separate queries. Firstly load all the relevant projects (this should be bounded by the fact that it's a group API filter) and then do another query to find the list of those that have ci_job_artifacts with file_type IN then use the filtered list of project ids to find the final list of projects. This is probably not a good option for very large groups with thousands of projects because our usage in https://gitlab.com/gitlab-org/gitlab/-/blob/d6f264fc4a623ce7892186e64cbc3c1a2507a993/ee/app/assets/javascripts/security_dashboard/store/modules/projects/actions.js is including subgroups and not filtering much. Since the ordering and pagination is on the projects it would not be possible to order and limit the first candidate set of project ids much so it would mean sending thousands of project ids in the intermediate query when ultimately we're only going to want 100. This would look something like:
    scope :with_security_reports, -> do
      project_id_currently_filtered = self.pluck(:id)
      filtered_project_ids_with_security_reports = ::Ci::JobArtifact.security_reports.where(project_id: project_id_currently_filtered).distinct.pluck(:project_id)
    
      where(id: filtered_project_ids_with_security_reports)
    end
  3. Is there some other table that already stores some metadata about security reports that isn't in the ci_* tables that we could join to? (eg. security_scans or security_findings noting that security_scans is going to be de-normalized to have a project_id column soon to avoid joining ci_builds)

Sharding team recommended option

Introduce a new table to store whether or not a project has security reports (de-normalization) and UPSERT into this table whenever we create security reports. Could be called projects_with_security_reports and only needs to contain the project_id for this use case.

Furthermore this looks a lot like #336199 (closed) so we may want to fix both in 1 go with a combined table to solve both problems called projects_with_ci_feature_usage with 2 columns project_id, ci_feature and then we can just store "code_coverage" and "security_report" in the ci_feature column when this is first used for the project.

Implementation plan

Expand for previous implementation plan
  1. Upsert a new record into the project_ci_feature_usages table when a security_report artifact is created
  2. Backfill project_ci_feature_usages table with existing project data
  3. Update EE::Project.with_security_reports to use project_ci_feature_usages table

Remove with_security_reports scope which relies on the ci_job_artifacts table, and add with_security_scans scope which uses the security_scans table, removing the cross-table join: !70031 (merged)

/cc @DylanGriffith

Edited by Adam Cohen