Create Gemnasium primary identifier when running continuous scanner

Glossary

  • Primary Identifier: The first identifier in finding - in other words, order matters!
  • CVS: Continuous Vulnerability Scanning - the new method of vulnerability scanning where findings are decoupled from pipelines.

Why are we doing this work

The Gemnasium analyzers create a custom primary identifier when a vulnerability is added to the security report. This is an important detail, because finding equality is based on the primary identifier. Since the primary identifier is assumed to be stable, we must re- create this custom identifier, and ensure that it is always the first identifier in the array of identifiers for an advisory. See !121607 (comment 1466014286) for more context of discovery.

This is consistent with Dependency Scanning JSON reports created by Gemnasium. See https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/f52965010d4d69d7bca39c00f9bc52cd65f7e0e7/qa/expect/js-npm/default/gl-dependency-scanning-report.json#L69-82 (excerpt):

    {
      "identifiers": [
        {
          "type": "gemnasium",
          "name": "Gemnasium-4774cd67-936f-419e-8533-ae5cfe7db9f9",
          "value": "4774cd67-936f-419e-8533-ae5cfe7db9f9",
          "url": "https://gitlab.com/gitlab-org/security-products/gemnasium-db/-/blob/v1.2.142/npm/lodash/CVE-2019-10744.yml"
        },
        {
          "type": "cve",
          "name": "CVE-2019-10744",
          "value": "CVE-2019-10744",
          "url": "https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10744"
        }
      ]
   }

Relevant links

Scope

This is scoped only to findings related to dependency scanning and does not include container scanning.

Non-functional requirements

  • Documentation:
  • Feature flag:
  • Performance:
  • Testing: Ensure specs pass after removing stubbed Gemansium identifiers in security report builder specs.

Proposal(s)

Create the identifier at runtime

Pros

  • Avoid re-exports if bugs caught in the identifier or we re-work the identifier.

Cons

  • Rewrites are always risky. Programming paradigms change between languages, and introduce the possibility that something does not match the previous implementation 1:1.

Create the identifier at export

Pros

  • Saves some cycles when generating the findings for the synthesized security report that CVS uses.
  • Can re-use the Go code since the external db components are also written in Go. This has served well, and the chance of a bug is very low.
  • You can add the permalink to the file at export time since the exporter has access to the GLAD source.
  • It usually takes less time to merge MRs in the exporter.

Cons

  • Any changes to the algorithm require another export of all GLAD advisories.

Deprecate and remove the custom identifier from the analyzers

Pros

  • Removed code is always good and reduces complexity.

Cons

  • Deprecation is not straight forward, and might even require some cleanup on our behalf. This is made even more difficult for self-hosted customers.
  • What will take its place? We'll need to select a new primary identifier, and this is non-trivial.

Implementation plan

The proposal to create the identifier at runtime, i.e. in the Rails monolith, has been chosen.

  • Update the Gitlab::VulnerabiltyScanning::FindingBuilder class so that it has a #primary_identifier private method.

  • Override this method in the child class Gitlab::VulnerabiltyScanning::DependencyScanning::FindingBuilder so that it creates the Gemnasium identifier and returns it.

    • The gemnasium primary identifier is constructed in Go originally. It's constructed like this:
    primary_identifier ||= "Gemnasium-#{advisory.advisory_xid}"
    • To prevent allocating each time this is called, memoize it.
  • Update security report builder specs to remove the gemnasium identifiers. Verify that the primary identifier is the same even after removing the stubbed identifiers.

/cc @gonzoyumo @fcatteau @ifrenkel


Verification steps

Edited by Oscar Tovar