Validate secure analyzer reports using JSON schema in QA stage

Problem to solve

The qa stage for secure analyzers currently uses a script script to compare reports for equality. Now that status, start_time and end_time have been added to SAST, CS, DS reports, we had to update the compare_reports.sh to ignore scan.start_time and scan.end_time when comparing reports in order to allow comparing two reports with different timestamps.

The pattern when adding variable fields, such as timestamps or versions has been to delete them from the reports before comparing them, thereby ignoring any variation between the values. However, doing this may result in false negatives, for example, if someone were to remove the start_time or end_time from the scanner code, the qa tests would not fail, because these fields are ignored.

I attempted to solve this issue by keeping the report keys while replacing the values with placeholder text, but it made the jq filter command much more complex:

jq_filter="del(.version) |
           del(.scan.scanner.version) |
           .scan.start_time |= if .!=null then \"$TIME_PLACEHOLDER_VALUE\" else \"$KEY_NOT_EXISTS\" end |
           .scan.end_time |= if .!=null then \"$TIME_PLACEHOLDER_VALUE\" else \"$KEY_NOT_EXISTS\" end |
           del(.vulnerabilities[]|.location.image) |
           del(.vulnerabilities[].id) |
           del(.remediations[].fixes[].id) |
           .vulnerabilities |= map_values(.links |= (. // [])) |
           .vulnerabilities |= map_values(.identifiers |= (. // [])) |
           (.. | arrays) |= sort"

It also didn't make sense to only changing this for the start_time and end_time fields without addressing all the other fields that have been deleted, such as .scan.scanner.version or .vulnerabilities[]|.location.image

Intended users

Sasha (Software Developer)

User experience goal

QA stage should fail when comparing reports that don't have the same keys.

Proposal

Expand for original proposal

Add a second comparison stage to the compare_reports.sh script to ensure that both reports have the same keys.

Add a second validation stage to the compare_reports.sh script to ensure that the generated report JSON can be validated against the Security Report Schemas

Implementation plan

Expand for original implementation plan

As stated in #244829 (closed) the solution could be to either nullify all the non-comparable value fields or instead go through and verify that the fields exist in a second validation command. For this purpose a list of all currently nullified fields can be created and then iterated through the list verifying that the key exists in the object via the has or a recursive lookup in every object checking that the current path matches the keys of this object.

modify compare_reports.sh to do a member field check as described above
test against 2 different reports (e.g. different scan times) and verify they continue to work

add another stage to the compare_reports.sh script which does the following:
1. Determines the schema version of the report by extracting it from the JSON file
2. Uses this schema version value and the report type to fetch the appropriate schema from the Security Report Schemas dist directory. For example, if the schema version is 7.0.1 and the report type is sast, the following schema should be retrieved:
  
  https://gitlab.com/gitlab-org/security-products/security-report-schemas/-/raw/v7.0.1-rc1/dist/sast-report-format.json
3. Verify the generated report JSON against the schema retrieved above using py3-jsonschema
4. If the verification fails, a warning message should be output to explain why and an exit code of 0 will be returned and the QA stage will succeed. As a follow up, another MR feat: Use custom exit code for schema validation failures will change this behaviour to return a 65 exit code and update the QA stages to allow_failure if this error code is encountered.
add rspec tests to verify the above behaviour against a variety of fixture files
test the updated compare_reports.sh script to ensure
1. all existing QA stages still pass
2. if the JSON schema validation fails, an error is output:
  
  https://gitlab.com/gitlab-org/security-products/tests/cplusplus/-/jobs/984256128

Further details

This change will prevent false negatives from occurring for our QA stages.

Availability & Testing

all (or a sample of) analyzers' qa jobs continue to run and pass as they did before

What does success look like, and how can we measure that?

Expand for original definition of success

When running a QA test against two reports, one of which has the following scan object:

  "scan": {
    "scanner": {
      "id": "clair",
      "name": "Clair",
      "url": "https://github.com/coreos/clair",
      "vendor": {
        "name": "GitLab"
      },
      "version": "2.1.4"
    },
    "type": "container_scanning",
    "status": "success",
    "start_time": "2020-09-03T02:21:52",
    "end_time": "2020-09-03T02:21:52"
  }

and one which doesn't have the start_time or end_time values:

  "scan": {
    "scanner": {
      "id": "clair",
      "name": "Clair",
      "url": "https://github.com/coreos/clair",
      "vendor": {
        "name": "GitLab"
      },
      "version": "2.1.4"
    },
    "type": "container_scanning",
    "status": "success"
  }

the QA stage should fail

When running a QA test against a report which is missing a required field, such as start_time:

  "scan": {
    "scanner": {
      "id": "clair",
      "name": "Clair",
      "url": "https://github.com/coreos/clair",
      "vendor": {
        "name": "GitLab"
      },
      "version": "2.1.4"
    },
    "type": "container_scanning",
    "status": "success"
  }

the QA stage should pass and output a warning message:

{
  'status': 'success',
  'end_time': '2021-01-25T12:09:34',
  'type': 'sast',
  'scanner': {
    'version': '2.0.15',
    'vendor': {
      'name': 'GitLab'
    },
    'url': 'https://www.dwheeler.com/flawfinder',
    'name': 'Flawfinder',
    'id': 'flawfinder'
  }
}: 'start_time' is a required property

What is the type of buyer?

Enterprise Edition GitLab Ultimate

Is this a cross-stage feature?

Yes, this affects all secure stage products

Links / references

gitlab-org/security-products/ci-templates!144 (comment 402241936) gitlab-org/security-products/analyzers/secrets!70 (comment 406405485)

/cc @NicoleSchwartz @gonzoyumo

Edited Jan 28, 2021 by Adam Cohen