semgrep exit code 7 - append mode while chaining 2 or more file passthrough types with SAST_RULESET_GIT_REFERENCE

Summary

While configuring a custom ruleset for semgrep, you can chain 2 or more passthrough types using append mode. Each passthrough appends a single rule to the ruleset.

The example in our documentation chains 2 raw passthrough types, which works as expected. Appending a file type to the raw type works. Appending a file type to another file type brings about an error in semrgep, which exits with the code 7.

According to the semgrep docs, exit code 7 suggests that at least one rule in the configuration is invalid.

In an attempt to investigate the issue, I discovered that as an intermediary step of report generation (gl-sast-report.json), a semgrep.sarif file is generated during the scan that can either contain vulnerabilities, or a more concise error message.

Upon printing the semgrep.sarif file (this can be done by cat /builds/group/project/semgrep.sarif in an after_script section of an overridden semgrep-sast job), the following message is found in the semgrep.sarif file:

"/tmp/glsastrulesetremoteref1341862880/rule-2.yml_0 was not a mapping"

This suggests that one of the yaml files used for chaining has an incorrect syntax.

Using the files provided in the steps to reproduce section, rule-2.yaml is appended to rule-1.yaml to form a complete ruleset, with rule-1.yaml responsible for initialising the top-level rules object.

During the scan, the command that runs is: /usr/local/bin/semgrep -f /tmp/glsastrulesetremoteref0123456789 -o /builds/<group>/semgrep-custom-rules-test-extra/semgrep.sarif --sarif --no-rewrite-rule-ids --strict --disable-version-check --no-git-ignore --exclude spec --exclude test --exclude tests --exclude tmp --enable-metrics --verbose

The /tmp/glsastrulesetremoteref0123456789 directory in the container has the individual yaml files that form the complete rules, but it also has the final file that is a combination of all the ruleset and should be used with the -f flag. i.e. rule-1.yaml, rule-2.yaml and my-rules.yml which is the target.

Since the -f flag checks the entire directory for rules, from the point of view of semgrep, rule-1.yaml has a correct syntax, as well as my-rules.yaml but rule-2.yaml which does not start with the top-level rules object and is only used for generating the final file, is considered to have incorrect syntax.

Steps to reproduce

Setup a project that contains the following files (Project A)

rule-1.yaml

rules:
- id: "secret"
  patterns:
    - pattern-either:
        - pattern: '$MASK = "..."'
    - metavariable-regex:
        metavariable: "$MASK"
        regex: "(password|pass|passwd|pwd|secret|token)"
  message: |
    Use of hard-coded password
  metadata:
    cwe: "..."
  severity: "ERROR"
  languages:
    - "go"

rule-2.yaml

- id: "insecure"
  patterns:
    - pattern: "func insecure() {...}"
  message: |
    Insecure function 'insecure' detected
  metadata:
    cwe: "..."
  severity: "ERROR"
  languages:
    - "go"

.gitlab/sast-ruleset.toml

[semgrep]
  description = "My custom ruleset for Semgrep"
  targetdir = "/sgrules"
  validate = true

  [[semgrep.passthrough]]
    type  = "file"
    target = "my-rules.yml"
    value = "rule-1.yml"
  
  [[semgrep.passthrough]]
    type  = "file"
    mode = "append"
    target = "my-rules.yml"
    value = "rule-2.yml"

Setup another project that contains your CI configuration and a sample .go file that contains vulnerabilities caught by the rules in step 1: (Project B)

.gitlab-ci.yaml

include:
  - template: Jobs/SAST.gitlab-ci.yml

variables:
  SAST_RULESET_GIT_REFERENCE: "gitlab-ci-token:$CI_JOB_TOKEN@gitlab.com/<group>/project A"
  SECURE_LOG_LEVEL: "debug"

semgrep-sast:
  script:
    - /analyzer run

test.go

package main

import (
	"bufio"
	"fmt"
	"os"
	"strings"
	"syscall"
	"golang.org/x/crypto/ssh/terminal"
)

func insecure() {
	// Initialize password with a default value
	password := "defaultpassword"

	reader := bufio.NewReader(os.Stdin)

	fmt.Print("Enter password: ")


	bytePassword, _ := terminal.ReadPassword(int(syscall.Stdin))
	password = string(bytePassword)

	// Trim any leading/trailing spaces or newline characters
	password = strings.TrimSpace(password)

	fmt.Printf("\nPassword entered: %s\n", password)
}

func main() {
	insecure()
}

(optional - depends on which credentials you are using for step 2)

If using the CI_JOB_TOKEN for authentication, in project A, navigate to Settings > CI/CD > Token Access. Allow CI job tokens from Project B to access Project A

Run the scan

Example Project

What is the current bug behavior?

Semgrep fails with exit code 7 while chaining 2 or more file passthrough types with SAST_RULESET_GIT_REFERENCE due to one or more the files containing a incorrect syntax from semgrep's perspective.

What is the expected correct behavior?

Semgrep should only use the target file during the scan for it's custom rules

Relevant logs and/or screenshots

[FATA] [Semgrep] [2023-08-31T16:12:58Z] [/go/src/buildapp/main.go:28] ▶ exit status 7

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info


(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)
(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)
(we will only investigate if the tests are passing)

Possible fixes

Edited Dec 18, 2023 by Christopher Mutua