Restore vulnerability statuses reset by semgrep 6.7.0 bug

What does this MR do and why?

This MR addresses a bug that was introduced by semgrep v6.7.0, which sorts the vulnerabilities[].identifiers[] array as a result of the change made in Sort vulnerability links and identifiers (gitlab-org/security-products/analyzers/report!116 - merged) • Adam Cohen • 18.3. Sorting the vulnerabilities[].identifiers[] array causes the primary identifier to be changed, which leads to corrupt vulnerability data.

This MR adds a batched background migration which fixes corrupt vulnerability data.

What caused this bug?

Here's the sequence of events that lead to this bug:

  1. semgrep v6.6.2 is the last release before the bug occurred. This release follows the normal convention of placing the semgrep_id as the first element in vulnerabilities[].identifiers[], thereby making it the primary identifier. See gl-sast-report-semgrep-6.6.2-multiple-vulnerabilities.json for example:

    Click to expand report generated by semgrep v6.6.2
    "identifiers": [
      {
        "type": "semgrep_id",
        "name": "bandit.B506",
        "value": "bandit.B506",
        "url": "https://semgrep.dev/r/gitlab.bandit.B506"
      },
      {
        "type": "cwe",
        "name": "CWE-502",
        "value": "502",
        "url": "https://cwe.mitre.org/data/definitions/502.html"
      },
      {
        "type": "owasp",
        "name": "A08:2021 - Software and Data Integrity Failures",
        "value": "A08:2021"
      },
      {
        "type": "owasp",
        "name": "A8:2017 - Insecure Deserialization",
        "value": "A8:2017"
      },
      {
        "type": "bandit_test_id",
        "name": "Bandit Test ID B506",
        "value": "B506"
      }
    ]
  2. semgrep v6.7.0 is then released, which introduces the bug:

    • Bumps the security report schema version from 15.1.4 to 15.2.2.

    • Sorts vulnerabilities[].identifiers[], and places cwe or owasp in the first element of the list, making it the new primary identifier. See gl-sast-report-semgrep-6.7.0-multiple-vulnerabilities-incorrect-primary-identifier.json:

      Click to expand report generated by semgrep v6.7.0
      "identifiers": [
        {
          "type": "cwe",
          "name": "CWE-502",
          "value": "502",
          "url": "https://cwe.mitre.org/data/definitions/502.html"
        },
        {
          "type": "owasp",
          "name": "A08:2021 - Software and Data Integrity Failures",
          "value": "A08:2021"
        },
        {
          "type": "owasp",
          "name": "A8:2017 - Insecure Deserialization",
          "value": "A8:2017"
        },
        {
          "type": "bandit_test_id",
          "name": "Bandit Test ID B506",
          "value": "B506"
        },
        {
          "type": "semgrep_id",
          "name": "bandit.B506",
          "value": "bandit.B506",
          "url": "https://semgrep.dev/r/gitlab.bandit.B506"
        }
      ]

    It's at this point where the bug manifests, because semgrep_id is no longer the primary identifier, and cwe (or owasp) is the new primary identifier.

  3. semgrep v6.7.1 is released, which fixes this bug and ensures the vulnerabilities[].identifiers[], places semgrep_id in the first element of the list, restoring it as the primary identifier. See gl-sast-report-semgrep-6.7.1-additional-vulnerabilities-correct-primary-identifier.json:

    Click to expand report generated by semgrep v6.7.1
    "identifiers": [
      {
        "type": "semgrep_id",
        "name": "bandit.B506",
        "value": "bandit.B506",
        "url": "https://semgrep.dev/r/gitlab.bandit.B506"
      },
      {
        "type": "cwe",
        "name": "CWE-502",
        "value": "502",
        "url": "https://cwe.mitre.org/data/definitions/502.html"
      },
      {
        "type": "owasp",
        "name": "A08:2021 - Software and Data Integrity Failures",
        "value": "A08:2021"
      },
      {
        "type": "owasp",
        "name": "A8:2017 - Insecure Deserialization",
        "value": "A8:2017"
      },
      {
        "type": "bandit_test_id",
        "name": "Bandit Test ID B506",
        "value": "B506"
      }
    ]

So the bug has been fixed in the analyzer code, however, changing the primary identifier of the vulnerabilities has caused the following issue to occur:

  1. When semgrep v6.7.0 is executed, the primary identifier for existing vulnerabilities is updated to cwe, and the vulnerability state for resolved vulnerabilities is reset to detected, however, the state for confirmed and dismissed vulnerabilities is not changed.
  2. If semgrep v6.7.1 is executed, new vulnerabilities (and vulnerability findings) are created, and the state of all of these new vulnerabilities is set to detected. These new vulnerabilities are duplicates of the vulnerabilities that were last detected by semgrep v6.6.2.

This merge request restores the vulnerability data and states to the values that were present when semgrep v6.6.2 was executed.

References

Investigate automatically restoring vulnerabili... (#577229) • Adam Cohen • 18.8

Edited by Adam Cohen

Merge request reports

Loading