rule-pack synthesis using rules from multiple sources
Proposal
For SAST analyzers that have already been transitioned to semgrep rules and other custom rules including VET, GitLab users may want to use various data-sources to run/build/assemble their own custom-rulesets.
Some of these data-sources may include:
- git repository
- local (sourced from the local file system)
- raw
Users may also want to disable a subset of the rules or overwrite them.
We could extend the current rule-syntax to add support for multiple data-sources. The rules (in the example below we used semgrep as an example) can be sourced from multiple places (file, raw, git, url). The configuration file below shows an example configuration we could use to build a custom ruleset myrulepack.yml
.
In essence the configuration below assembles a rule pack by pulling various rules from an arbitrary git repository, a local file, an a raw/inline configuration:
[semgrep]
description = 'semgrep custom rules configuration'
targetdir = "/sgrules"
validate = true
[[semgrep.passthrough]]
type = "raw"
value = """# My ruleset
"""
target = "rule.yml"
[[semgrep.passthrough]]
type = "url"
value = "https://semgrep.dev/c/p/gosec"
target = "rule.yml"
mode = "append"
[[semgrep.passthrough]]
type = "file"
value = "foo.yml"
target = "rule.yml"
mode = "append"
[[semgrep.passthrough]]
type = "raw"
mode = "append"
target = "rule.yml"
value = """
- id: "Foo"
patterns:
- pattern: "func Foo() {...}"
message: |
Function Foo detected
metadata:
cwe: "CWE-200: Exposure of Sensitive Information to an Unauthorized Actor"
severity: "ERROR"
languages:
- "go"
"""
[[semgrep.passthrough]]
type = "file"
value = "bar.yml"
validator = "yaml"
[[semgrep.passthrough]]
type = "git"
value = "https://github.com/dgryski/semgrep-go"
ref = "b14e2f07411c22cadaab3a5d7df2346a99e7b36d"
[[semgrep.passthrough]]
type = "git"
value = "https://gitlab.com/julianthome/semgrep-rules"
subdir = "go"
ref = "refs/heads/develop"