semgrep exit code 7 - append mode while chaining 2 or more file passthrough types with SAST_RULESET_GIT_REFERENCE
Summary
While configuring a custom ruleset for semgrep, you can chain 2 or more passthrough types using append mode. Each passthrough appends a single rule to the ruleset.
The example in our documentation chains 2 raw passthrough types, which works as expected. Appending a file type to the raw type works. Appending a file type to another file type brings about an error in semrgep, which exits with the code 7.
According to the semgrep docs, exit code 7 suggests that at least one rule in the configuration is invalid.
In an attempt to investigate the issue, I discovered that as an intermediary step of report generation (gl-sast-report.json), a semgrep.sarif file is generated during the scan that can either contain vulnerabilities, or a more concise error message.
Upon printing the semgrep.sarif file (this can be done by cat /builds/group/project/semgrep.sarif in an after_script section of an overridden semgrep-sast job), the following message is found in the semgrep.sarif file:
"/tmp/glsastrulesetremoteref1341862880/rule-2.yml_0 was not a mapping"
This suggests that one of the yaml files used for chaining has an incorrect syntax.
Using the files provided in the steps to reproduce section, rule-2.yaml is appended to rule-1.yaml to form a complete ruleset, with rule-1.yaml responsible for initialising the top-level rules object.
During the scan, the command that runs is: /usr/local/bin/semgrep -f /tmp/glsastrulesetremoteref0123456789 -o /builds/<group>/semgrep-custom-rules-test-extra/semgrep.sarif --sarif --no-rewrite-rule-ids --strict --disable-version-check --no-git-ignore --exclude spec --exclude test --exclude tests --exclude tmp --enable-metrics --verbose
The /tmp/glsastrulesetremoteref0123456789 directory in the container has the individual yaml files that form the complete rules, but it also has the final file that is a combination of all the ruleset and should be used with the -f flag. i.e. rule-1.yaml, rule-2.yaml and my-rules.yml which is the target.
Since the -f flag checks the entire directory for rules, from the point of view of semgrep, rule-1.yaml has a correct syntax, as well as my-rules.yaml but rule-2.yaml which does not start with the top-level rules object and is only used for generating the final file, is considered to have incorrect syntax.
Steps to reproduce
- Setup a project that contains the following files (Project A)
rule-1.yaml
rules:
- id: "secret"
patterns:
- pattern-either:
- pattern: '$MASK = "..."'
- metavariable-regex:
metavariable: "$MASK"
regex: "(password|pass|passwd|pwd|secret|token)"
message: |
Use of hard-coded password
metadata:
cwe: "..."
severity: "ERROR"
languages:
- "go"
rule-2.yaml
- id: "insecure"
patterns:
- pattern: "func insecure() {...}"
message: |
Insecure function 'insecure' detected
metadata:
cwe: "..."
severity: "ERROR"
languages:
- "go"
.gitlab/sast-ruleset.toml
[semgrep]
description = "My custom ruleset for Semgrep"
targetdir = "/sgrules"
validate = true
[[semgrep.passthrough]]
type = "file"
target = "my-rules.yml"
value = "rule-1.yml"
[[semgrep.passthrough]]
type = "file"
mode = "append"
target = "my-rules.yml"
value = "rule-2.yml"
- Setup another project that contains your CI configuration and a sample .go file that contains vulnerabilities caught by the rules in step 1: (Project B)
.gitlab-ci.yaml
include:
- template: Jobs/SAST.gitlab-ci.yml
variables:
SAST_RULESET_GIT_REFERENCE: "gitlab-ci-token:$CI_JOB_TOKEN@gitlab.com/<group>/project A"
SECURE_LOG_LEVEL: "debug"
semgrep-sast:
script:
- /analyzer run
test.go
package main
import (
"bufio"
"fmt"
"os"
"strings"
"syscall"
"golang.org/x/crypto/ssh/terminal"
)
func insecure() {
// Initialize password with a default value
password := "defaultpassword"
reader := bufio.NewReader(os.Stdin)
fmt.Print("Enter password: ")
bytePassword, _ := terminal.ReadPassword(int(syscall.Stdin))
password = string(bytePassword)
// Trim any leading/trailing spaces or newline characters
password = strings.TrimSpace(password)
fmt.Printf("\nPassword entered: %s\n", password)
}
func main() {
insecure()
}
- (optional - depends on which credentials you are using for step 2)
If using the CI_JOB_TOKEN for authentication, in project A, navigate to Settings > CI/CD > Token Access. Allow CI job tokens from Project B to access Project A
- Run the scan
Example Project
What is the current bug behavior?
Semgrep fails with exit code 7 while chaining 2 or more file passthrough types with SAST_RULESET_GIT_REFERENCE due to one or more the files containing a incorrect syntax from semgrep's perspective.
What is the expected correct behavior?
Semgrep should only use the target file during the scan for it's custom rules
Relevant logs and/or screenshots
[FATA] [Semgrep] [2023-08-31T16:12:58Z] [/go/src/buildapp/main.go:28] ▶ exit status 7
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)(we will only investigate if the tests are passing)