Skip to content

Allow group.vulnerabilities GraphQL API with owasp_top_10 filters to use unnest IN query optimization

When expanding the Non - Owasp top 10 group by category on large groups, we observe that the underlying DB query from the GraphQL API is timing out.

Request Payload:

{"operationName":"groupVulnerabilities","variables":{"first":100,"vetEnabled":true,"fullPath":"gitlab-org","sort":"severity_desc","last":null,"state":["DETECTED","CONFIRMED"],"dismissalReason":[],"hasResolution":false,"reportType":["API_FUZZING","CONTAINER_SCANNING","COVERAGE_FUZZING","DAST","DEPENDENCY_SCANNING","SAST","SECRET_DETECTION","GENERIC"],"owaspTopTen":["NONE"]},"query":"query groupVulnerabilities($fullPath: ID!, $before: String, $after: String, $first: Int = 20, $last: Int, $projectId: [ID!], $severity: [VulnerabilitySeverity!], $reportType: [VulnerabilityReportType!], $scanner: [String!], $scannerId: [VulnerabilitiesScannerID!], $state: [VulnerabilityState!], $dismissalReason: [VulnerabilityDismissalReason!], $sort: VulnerabilitySort, $hasIssues: Boolean, $hasResolution: Boolean, $hasMergeRequest: Boolean, $hasRemediations: Boolean, $vetEnabled: Boolean = false, $clusterAgentId: [ClustersAgentID!], $owaspTopTen: [VulnerabilityOwaspTop10!]) {\n  group(fullPath: $fullPath) {\n    id\n    vulnerabilities(\n      before: $before\n      after: $after\n      first: $first\n      last: $last\n      severity: $severity\n      reportType: $reportType\n      scanner: $scanner\n      scannerId: $scannerId\n      state: $state\n      dismissalReason: $dismissalReason\n      projectId: $projectId\n      sort: $sort\n      hasIssues: $hasIssues\n      hasResolution: $hasResolution\n      hasMergeRequest: $hasMergeRequest\n      hasRemediations: $hasRemediations\n      clusterAgentId: $clusterAgentId\n      owaspTopTen: $owaspTopTen\n    ) {\n      nodes {\n        ...VulnerabilityFragment\n        __typename\n      }\n      pageInfo {\n        ...PageInfo\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n\nfragment VulnerabilityFragment on Vulnerability {\n  id\n  title\n  state\n  severity\n  detectedAt\n  dismissalReason\n  vulnerabilityPath\n  resolvedOnDefaultBranch\n  userNotesCount\n  falsePositive @include(if: $vetEnabled)\n  hasRemediations\n  issueLinks {\n    nodes {\n      id\n      issue {\n        id\n        iid\n        webUrl\n        webPath\n        title\n        state\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n  mergeRequest {\n    id\n    webUrl\n    state\n    iid\n    __typename\n  }\n  identifiers {\n    externalType\n    name\n    __typename\n  }\n  location {\n    ...VulnerabilityLocation\n    __typename\n  }\n  project {\n    id\n    nameWithNamespace\n    __typename\n  }\n  reportType\n  scanner {\n    id\n    vendor\n    __typename\n  }\n  __typename\n}\n\nfragment VulnerabilityLocation on VulnerabilityLocation {\n  ... on VulnerabilityLocationClusterImageScanning {\n    image\n    kubernetesResource {\n      agent {\n        id\n        name\n        webPath\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n  ... on VulnerabilityLocationContainerScanning {\n    image\n    __typename\n  }\n  ... on VulnerabilityLocationDependencyScanning {\n    blobPath\n    file\n    __typename\n  }\n  ... on VulnerabilityLocationSast {\n    blobPath\n    file\n    startLine\n    __typename\n  }\n  ... on VulnerabilityLocationSecretDetection {\n    blobPath\n    file\n    startLine\n    __typename\n  }\n  ... on VulnerabilityLocationDast {\n    path\n    __typename\n  }\n  __typename\n}\n\nfragment PageInfo on PageInfo {\n  hasNextPage\n  hasPreviousPage\n  startCursor\n  endCursor\n  __typename\n}\n"}

Response:

{
    "errors": [
        {
            "message": "Request timed out. Please try a less complex query or a smaller set of records."
        }
    ]
}

As observed in the explain plans, this is happening even though we have an index because the in filter unnest optimisations are not being applied to the final query.

Observations

For owasp_top_10 IS NULL condition

Without unnest optimization: https://explain.depesz.com/s/63WG#stats https://explain.depesz.com/s/BZa2#stats

With unnest optimization applied: https://explain.depesz.com/s/yxtL#stats

For owasp_top_10 = ? condition

Without unnest optimization: https://explain.depesz.com/s/tlSv#stats

With unnest optimization: https://explain.depesz.com/s/uM03#stats

Above results conclude that not only for owasp_top_10 IS NULL, we can apply the unnest optimization for with values as well owasp_top_10 = ?.

Proposal

Current query:
SELECT
  "vulnerability_reads".*
FROM
  "vulnerability_reads"
WHERE
  (traversal_ids >= '{9970}')
  AND (traversal_ids < '{9971}')
  AND "vulnerability_reads"."archived" = FALSE
  AND "vulnerability_reads"."report_type" IN (6, 2, 5, 3, 1, 0, 4, 99)
  AND "vulnerability_reads"."severity" IN (1, 2, 4, 5, 6, 7)
  AND "vulnerability_reads"."state" IN (1, 4)
  AND "vulnerability_reads"."resolved_on_default_branch" = FALSE
  AND "vulnerability_reads"."owasp_top_10" IS NULL
ORDER BY
  "vulnerability_reads"."severity" DESC,
  "vulnerability_reads"."traversal_ids" DESC,
  "vulnerability_reads"."vulnerability_id" DESC
LIMIT
  101
New query:
SELECT
  "vulnerability_reads".*
FROM
  "vulnerability_reads"
WHERE
  "vulnerability_reads"."vulnerability_id" IN (
    SELECT
      "vulnerability_reads"."vulnerability_id"
    FROM
      unnest('{6,2,5,3,1,0,4,99}' ::smallint []) AS "report_types"("report_type"),
      unnest('{1,2,4,5,6,7}' ::smallint []) AS "severities"("severity"),
      unnest('{1,4}' ::smallint []) AS "states"("state"),
      LATERAL (
        SELECT
          "vulnerability_reads"."archived",
          "vulnerability_reads"."report_type",
          "vulnerability_reads"."severity",
          "vulnerability_reads"."state",
          "vulnerability_reads"."resolved_on_default_branch",
          "vulnerability_reads"."owasp_top_10",
          "vulnerability_reads"."traversal_ids",
          "vulnerability_reads"."vulnerability_id"
        FROM
          "vulnerability_reads"
        WHERE
          (traversal_ids >= '{9970}')
          AND (traversal_ids < '{9971}')
          AND "vulnerability_reads"."archived" = FALSE
          AND "vulnerability_reads"."resolved_on_default_branch" = FALSE
          AND "vulnerability_reads"."owasp_top_10" IS NULL
          AND (
            vulnerability_reads."report_type" = "report_types"."report_type"
          )
          AND (
            vulnerability_reads."severity" = "severities"."severity"
          )
          AND (vulnerability_reads."state" = "states"."state")
        ORDER BY
          "vulnerability_reads"."severity" DESC,
          "vulnerability_reads"."traversal_ids" DESC,
          "vulnerability_reads"."vulnerability_id" DESC
        LIMIT
          101
      ) AS vulnerability_reads
    ORDER BY
      "vulnerability_reads"."severity" DESC,
      "vulnerability_reads"."traversal_ids" DESC,
      "vulnerability_reads"."vulnerability_id" DESC
    LIMIT
      101
  )
ORDER BY
  "vulnerability_reads"."severity" DESC,
  "vulnerability_reads"."traversal_ids" DESC,
  "vulnerability_reads"."vulnerability_id" DESC
LIMIT
  101

Current behaviour for owasp_top_10 IS NULL

  • Without unnest: Query is timing out as it is not using the available index. See: https://explain.depesz.com/s/BZa2#stats

  • With unnest: Query is using another index and although it does not timeout, it is still problematic as it is not making use of the owasp_top_10 group level index and because of the reason listed here it is still a concern. To force use the existing owasp_top_10 group level index we have to change null to -1 or use order by owasp_top_10 in the query as summarized here. Changing null to -1 is the more straight forward approach. See for query plan: https://explain.depesz.com/s/yxtL#stats

Solution identified:

Implementation plan:

Milestone MR's planned Status
%17.3 1. Turn off the null FF for gitlab.com and change default value of column owasp_top_10 from null to -1 with a migration.
2. Change vulnerability_reads_finder to use -1 instead of null.
1. !161823 (merged) workflowcomplete
2. !161890 (merged) workflowcomplete
%17.4 3. BBM migration to backfill null values, once this migration completes, turn on FF on gitlab.com (After this step, the timeouts should stop but the query would still take a bit longer to load for Non OWASP category on gitlab.com/gitlab-org).
4. Backport the above BBM to %17.3 as it is the last required stop. Subjected to confirmation in slack discussion.
5. Modify the existing index to use unnest optimizations to further improve performance that is acceptable within guidelines. (resolved_on_default_branch should be in front of the severity, traversal_ids, and vulnerability_id columns)
6. Enable the FF on gitlab.com and schedule FF removal in %17.6 (as finalization can only be done after a required stop which is 17.5)
3. !161962 (merged) workflowcomplete
4. Backport option is ruled out.
5. !164252 (merged) workflowcomplete
6. #473748 (comment 2098930032) workflowverification
%17.6 7. Finalize the BBM introduced in MR3/MR4. Subjected to slack discussion (if backport is agreed, we can move this and below FF removal step to %17.4)
8. Remove FF.
Edited by Bala Kumar