Fix semgrep scans using a SAST_RULESET_GIT_REFERENCE repo that contains non-rule yaml files

Problem to solve

When the environment variable SAST_RULESET_GIT_REFERENCE is used with the semgrep analyzer, if the the referenced repository contains non-rule yaml files, semgrep may fail trying to parse them as if they were.

Root cause

In this change from this MR, the TargetDir of the remote ruleset configuration is overwritten with the path containing the entire repository.

This TargetDir is then passed as the configuration for semgrep via -f.

The effect is that the entire repo referenced by SAST_RULESET_GIT_REFERENCE is subject to rule validation by semgrep, so any non-rule yaml file can cause semgrep to fail.

The behavior to preserve

The reason that the TargetDir is overwritten is ultimately based on the discussion #393452 (comment 1297675190) and the decision #393452 (comment 1353700401).

The resulting system behavior is summarized below.

Suppose that the analyzer is run from ProjectLocal with SAST_RULESET_GIT_REFERENCE=gitlab.com/ProjectRemote.

The ruleset configuration at gitlab.com/ProjectRemote/.gitlab/sast-ruleset.toml is parsed.

Passthroughs with type = "file" refer to files in gitlab.com/ProjectRemote, e.g. if gitlab.com/ProjectRemote/.gitlab/sast-ruleset.toml contains

  [[semgrep.passthrough]]
    type  = "file"
    value = "some_rule.yml"

then the rule referenced is gitlab.com/ProjectRemote/some_rule.yml and NOT ProjectLocal/some_rule.yml.

Related Issue

https://gitlab.com/gitlab-com/sec-sub-department/section-sec-request-for-help/-/issues/376+s

Proposal

  1. Retain the configured TargetDir of the remote ruleset configuration, but ensure that type = "file" passthrough values are from the cloned remote repository

    see Entire remote ruleset repository should not be ... (gitlab-org/security-products/analyzers/ruleset!47 - merged) • Jason Leasure • 17.5

  2. Update the ruleset dependency in analyzers/semgrep

Edited by Jason Leasure