One file with syntax errors should not stop SAST and similar kind of jobs from running
This SAST job failed for me recently, because apparently one file had an syntax error:
https://gitlab.com/gitlab-org/gitlab-ce/-/jobs/86638388
In my opinion one unparseable file should not stop scanning of hundred files other files, so I would do something like this:
all_files= set of all files
compiled_files = {}
error_files = {}
for file in all_files
try
compiled = compile(file)
compiled_files = compiled_files ∪ {compiled}
catch e
error_files = error_files ∪ {file}
run_sast(compiled_files)
print "SAST could not scan " + error_files
At the moment we are simply scanning all javascript files in a repo. But it is totally legit to have corrupt or incorrect files in a repo, because you might be building a parser yourself and need those files for tests. You are testing systems how they react to broken inputs.
Activity Summary
- Certain scanners require "pre-processing steps" to perform their analysis: in some cases compilation, or dependency fetching, or comment stripping
- pre-processing steps can fail, causing the job to fail
- question: if pre-processing fails, should we attempt scan anyway? can we?
- answer: it depends.
- A "syntax error" here is an unsupported runtime essentially; new javascript features our parser doesn't recognize. If we can't strip comments what do we do? See below possible solutions.
- If compilation fails, we cannot build a project. If we can initiate a partial scan that's essentially reporting incomplete data. We should avoid false negatives as much as possible.
- answer: it depends.
- this issue seems largely isolated to javascript, we haven't seen other languages with similar issues (although it's possible)
Possible solutions
- Remove babel preprocessing #199660 (closed) - this should effectively solve the issue for
nodejs-scan
, the primary scanner that experiences this issue. A stop-gap is the implementation of #33065 (closed) which has a much lower weight and Level-of-effort. - Allow syntax errors - If we initiate a partial scan, that's essentially reporting incomplete data. We should avoid false negatives as much as possible. This is not viable, IMO
Edited by Thomas Woodham