Fix semgrep scans using a SAST_RULESET_GIT_REFERENCE repo that contains non-rule yaml files
Problem to solve
When the environment variable SAST_RULESET_GIT_REFERENCE is used with the semgrep analyzer, if the the referenced repository contains non-rule yaml files, semgrep may fail trying to parse them as if they were.
Root cause
In this change from this MR, the TargetDir of the remote ruleset configuration is overwritten with the path containing the entire repository.
This TargetDir is then passed as the configuration for semgrep via -f.
The effect is that the entire repo referenced by SAST_RULESET_GIT_REFERENCE is subject to rule validation by semgrep, so any non-rule yaml file can cause semgrep to fail.
The behavior to preserve
The reason that the TargetDir is overwritten is ultimately based on the discussion
#393452 (comment 1297675190) and the decision #393452 (comment 1353700401).
The resulting system behavior is summarized below.
Suppose that the analyzer is run from ProjectLocal with SAST_RULESET_GIT_REFERENCE=gitlab.com/ProjectRemote.
The ruleset configuration at gitlab.com/ProjectRemote/.gitlab/sast-ruleset.toml
is parsed.
Passthroughs with type = "file" refer to files in gitlab.com/ProjectRemote, e.g. if gitlab.com/ProjectRemote/.gitlab/sast-ruleset.toml contains
[[semgrep.passthrough]]
type = "file"
value = "some_rule.yml"
then the rule referenced is gitlab.com/ProjectRemote/some_rule.yml and NOT ProjectLocal/some_rule.yml.
Related Issue
https://gitlab.com/gitlab-com/sec-sub-department/section-sec-request-for-help/-/issues/376+s
Proposal
-
Retain the configured TargetDirof the remote ruleset configuration, but ensure thattype = "file"passthrough values are from the cloned remote repository -
Update the rulesetdependency inanalyzers/semgrep