Validate secure analyzer reports using JSON schema in QA stage
Problem to solve
The qa stage for secure analyzers currently uses a script script to compare reports for equality. Now that status, start_time and end_time have been added to SAST, CS, DS reports, we had to update the compare_reports.sh
to ignore scan.start_time and scan.end_time when comparing reports in order to allow comparing two reports with different timestamps.
The pattern when adding variable fields, such as timestamps or versions has been to delete them from the reports before comparing them, thereby ignoring any variation between the values. However, doing this may result in false negatives, for example, if someone were to remove the start_time
or end_time
from the scanner code, the qa tests would not fail, because these fields are ignored.
I attempted to solve this issue by keeping the report keys while replacing the values with placeholder text, but it made the jq filter
command much more complex:
jq_filter="del(.version) |
del(.scan.scanner.version) |
.scan.start_time |= if .!=null then \"$TIME_PLACEHOLDER_VALUE\" else \"$KEY_NOT_EXISTS\" end |
.scan.end_time |= if .!=null then \"$TIME_PLACEHOLDER_VALUE\" else \"$KEY_NOT_EXISTS\" end |
del(.vulnerabilities[]|.location.image) |
del(.vulnerabilities[].id) |
del(.remediations[].fixes[].id) |
.vulnerabilities |= map_values(.links |= (. // [])) |
.vulnerabilities |= map_values(.identifiers |= (. // [])) |
(.. | arrays) |= sort"
It also didn't make sense to only changing this for the start_time
and end_time
fields without addressing all the other fields that have been deleted, such as .scan.scanner.version
or .vulnerabilities[]|.location.image
Intended users
User experience goal
QA stage should fail when comparing reports that don't have the same keys.
Proposal
Expand for original proposal
Add a second comparison stage to the compare_reports.sh script to ensure that both reports have the same keys.
Add a second validation stage to the compare_reports.sh script to ensure that the generated report JSON can be validated against the Security Report Schemas
Implementation plan
Expand for original implementation plan
As stated in #244829 (closed) the solution could be to either nullify all the non-comparable value fields or instead go through and verify that the fields exist in a second validation command. For this purpose a list of all currently nullified fields can be created and then iterated through the list verifying that the key exists in the object via the
has
or a recursive lookup in every object checking that the current path matches the keys of this object.
-
modify compare_reports.sh to do a member field check as described above -
test against 2 different reports (e.g. different scan times) and verify they continue to work
- add another stage to the compare_reports.sh script which does the following:
-
Determines the
schema version
of the report by extracting it from the JSON file -
Uses this
schema version
value and the report type to fetch the appropriate schema from the Security Report Schemas dist directory. For example, if theschema version
is 7.0.1 and the report type issast
, the following schema should be retrieved: -
Verify the generated report JSON against the schema retrieved above using py3-jsonschema
-
If the verification fails, a warning message should be output to explain why and an exit code of 0 will be returned and the QA stage will succeed. As a follow up, another MR feat: Use custom exit code for schema validation failures will change this behaviour to return a
65
exit code and update the QA stages toallow_failure
if this error code is encountered.
-
- add
rspec
tests to verify the above behaviour against a variety of fixture files - test the updated
compare_reports.sh
script to ensure-
all existing QA stages still pass
-
if the JSON schema validation fails, an error is output:
https://gitlab.com/gitlab-org/security-products/tests/cplusplus/-/jobs/984256128
-
Further details
This change will prevent false negatives from occurring for our QA stages.
Availability & Testing
-
all (or a sample of) analyzers' qa jobs continue to run and pass as they did before
What does success look like, and how can we measure that?
Expand for original definition of success
When running a QA test against two reports, one of which has the following scan object:
"scan": { "scanner": { "id": "clair", "name": "Clair", "url": "https://github.com/coreos/clair", "vendor": { "name": "GitLab" }, "version": "2.1.4" }, "type": "container_scanning", "status": "success", "start_time": "2020-09-03T02:21:52", "end_time": "2020-09-03T02:21:52" }
and one which doesn't have the
start_time
orend_time
values:"scan": { "scanner": { "id": "clair", "name": "Clair", "url": "https://github.com/coreos/clair", "vendor": { "name": "GitLab" }, "version": "2.1.4" }, "type": "container_scanning", "status": "success" }
the QA stage should fail
When running a QA test against a report which is missing a required field, such as start_time
:
"scan": {
"scanner": {
"id": "clair",
"name": "Clair",
"url": "https://github.com/coreos/clair",
"vendor": {
"name": "GitLab"
},
"version": "2.1.4"
},
"type": "container_scanning",
"status": "success"
}
the QA stage should pass and output a warning message:
{
'status': 'success',
'end_time': '2021-01-25T12:09:34',
'type': 'sast',
'scanner': {
'version': '2.0.15',
'vendor': {
'name': 'GitLab'
},
'url': 'https://www.dwheeler.com/flawfinder',
'name': 'Flawfinder',
'id': 'flawfinder'
}
}: 'start_time' is a required property
What is the type of buyer?
Enterprise Edition GitLab Ultimate
Is this a cross-stage feature?
Yes, this affects all secure stage products
Links / references
gitlab-org/security-products/ci-templates!144 (comment 402241936) gitlab-org/security-products/analyzers/secrets!70 (comment 406405485)