Validate secure analyzer reports using JSON schema in QA stage

Problem to solve

The qa stage for secure analyzers currently uses a script script to compare reports for equality. Now that status, start_time and end_time have been added to SAST, CS, DS reports, we had to update the compare_reports.sh to ignore scan.start_time and scan.end_time when comparing reports in order to allow comparing two reports with different timestamps.

The pattern when adding variable fields, such as timestamps or versions has been to delete them from the reports before comparing them, thereby ignoring any variation between the values. However, doing this may result in false negatives, for example, if someone were to remove the start_time or end_time from the scanner code, the qa tests would not fail, because these fields are ignored.

I attempted to solve this issue by keeping the report keys while replacing the values with placeholder text, but it made the jq filter command much more complex:

jq_filter="del(.version) |
           del(.scan.scanner.version) |
           .scan.start_time |= if .!=null then \"$TIME_PLACEHOLDER_VALUE\" else \"$KEY_NOT_EXISTS\" end |
           .scan.end_time |= if .!=null then \"$TIME_PLACEHOLDER_VALUE\" else \"$KEY_NOT_EXISTS\" end |
           del(.vulnerabilities[]|.location.image) |
           del(.vulnerabilities[].id) |
           del(.remediations[].fixes[].id) |
           .vulnerabilities |= map_values(.links |= (. // [])) |
           .vulnerabilities |= map_values(.identifiers |= (. // [])) |
           (.. | arrays) |= sort"

It also didn't make sense to only changing this for the start_time and end_time fields without addressing all the other fields that have been deleted, such as .scan.scanner.version or .vulnerabilities[]|.location.image

Intended users

User experience goal

QA stage should fail when comparing reports that don't have the same keys.

Proposal

Expand for original proposal

Add a second comparison stage to the compare_reports.sh script to ensure that both reports have the same keys.

Add a second validation stage to the compare_reports.sh script to ensure that the generated report JSON can be validated against the Security Report Schemas

Implementation plan

Expand for original implementation plan

As stated in #244829 (closed) the solution could be to either nullify all the non-comparable value fields or instead go through and verify that the fields exist in a second validation command. For this purpose a list of all currently nullified fields can be created and then iterated through the list verifying that the key exists in the object via the has or a recursive lookup in every object checking that the current path matches the keys of this object.

  • modify compare_reports.sh to do a member field check as described above
  • test against 2 different reports (e.g. different scan times) and verify they continue to work

Further details

This change will prevent false negatives from occurring for our QA stages.

Availability & Testing

  • all (or a sample of) analyzers' qa jobs continue to run and pass as they did before

What does success look like, and how can we measure that?

Expand for original definition of success

When running a QA test against two reports, one of which has the following scan object:

  "scan": {
    "scanner": {
      "id": "clair",
      "name": "Clair",
      "url": "https://github.com/coreos/clair",
      "vendor": {
        "name": "GitLab"
      },
      "version": "2.1.4"
    },
    "type": "container_scanning",
    "status": "success",
    "start_time": "2020-09-03T02:21:52",
    "end_time": "2020-09-03T02:21:52"
  }

and one which doesn't have the start_time or end_time values:

  "scan": {
    "scanner": {
      "id": "clair",
      "name": "Clair",
      "url": "https://github.com/coreos/clair",
      "vendor": {
        "name": "GitLab"
      },
      "version": "2.1.4"
    },
    "type": "container_scanning",
    "status": "success"
  }

the QA stage should fail

When running a QA test against a report which is missing a required field, such as start_time:

  "scan": {
    "scanner": {
      "id": "clair",
      "name": "Clair",
      "url": "https://github.com/coreos/clair",
      "vendor": {
        "name": "GitLab"
      },
      "version": "2.1.4"
    },
    "type": "container_scanning",
    "status": "success"
  }

the QA stage should pass and output a warning message:

{
  'status': 'success',
  'end_time': '2021-01-25T12:09:34',
  'type': 'sast',
  'scanner': {
    'version': '2.0.15',
    'vendor': {
      'name': 'GitLab'
    },
    'url': 'https://www.dwheeler.com/flawfinder',
    'name': 'Flawfinder',
    'id': 'flawfinder'
  }
}: 'start_time' is a required property

What is the type of buyer?

Enterprise Edition GitLab Ultimate

Is this a cross-stage feature?

Yes, this affects all secure stage products

Links / references

gitlab-org/security-products/ci-templates!144 (comment 402241936) gitlab-org/security-products/analyzers/secrets!70 (comment 406405485)

/cc @NicoleSchwartz @gonzoyumo

Edited by Adam Cohen