Document rules and CWE coverage for GitLab SAST
It is currently difficult for customers to understand the set of vulnerabilities GitLab SAST looks for. We can do incomplete or inconvenient attempts at this today by digging through sast-rules but that's clearly not ideal.
As an initial solution to this problem, we should add new documentation content that enumerates the available rules. Similar docs exist today for Secret Detection and Dynamic Analysis, though each of these existing solutions has a
Scope of coverage
We must document the coverage of Advanced SAST rules. We should document the summarized coverage of sast-rules
.
If we need to choose only one (Advanced SAST or the sast-rules
repo), we need to choose Advanced SAST.
This is because Advanced SAST rules are not searchable online.
The two sets of rules should not be mixed together, and tiering should be clearly marked. Advanced SAST rules are Ultimate-only, while sast-rules
are primarily available to all tiers.
Source of truth
This must be based on the shipped rules, not based on the database. (Previously we used the vulns DB to generate stats like this; this is not necessary when we control the rulesets.)
We should focus on rules we manage; we do not need to target upstream rulesets like those of SpotBugs, PMD-Apex, or Sobelow.
What content to publish
We can start by focusing on summary counts, rather than duplicating information like rule descriptions. This is because:
- Rule descriptions are quite long; with hundreds of rules, this could easily be tens of thousands of words in the documentation.
- Anyone who has a finding already has access to this content, since it is included in the vulnerabilty record.
- Descriptions may either substantially describe the actual detection logic (unnecessary detail) or be more vague than is helpful (not much added value compared to a title/CWE/etc.).
The most detail we should consider is a list of titles for what the rules find, but with hundreds of rules this could still be unwieldly.
Automation
It is acceptable for the content to be manually added to docs. However, we must automate and internally document the process of generating the content so that we can update the docs content on a recurring basis.
Grouping
CWEs
We certainly need to group by CWEs, perhaps a tree-style view that reflects the hierarchical nature of CWEs. The underlying reason is because:
- People need to know that bug classes like path traversals or injections are covered, and won't necessarily know to ask "is injection via technology XYZ covered?". - Our rules are only going to be so specific; for example, they'll likely be more generic than extremely specific CWES like CWE-39, Path Traversal:
C:dirname
.
Going all the way to the topmost parent CVE in the hierarchy would seem ideal, although sometimes the root is rather vague. For example, CWE-77: Command injection is a ChildOf CWE-74: Injection, but we want to be able to show that we cover SQLi, CMDi, etc. Perhaps we generally go to the topmost CWE, but have a list of "overrides" to stop ourselves from going too far up the chain into unhelpful genericity (like to "CWE-74: Injection").
CWEs also have different "views" that relate them in different ParentOf/ChildOf/PeerOf relationships. "Research Concepts" (CWE-1000) seems the most comprehensive, at first glance.
OWASP Top 10
To support people's desire to check whether we support OWASP Top 10, we should also have a way of showing this.
We can limit ourselves to the most recent Top 10 version.
This can be separate from the CWE-oriented listing, or an annotation on that existing display.
Languages
If possible we should be able to show these stats per programming language. This is not strictly necessary in the first iteration but would be very helpful because detection rules are different per language.
Example user questions
- Which CWEs are covered by GitLab SAST?
- Does GitLab SAST cover the OWASP Top 10? (People very often ask this, even though it is very reductive and even though the Top 10 team says "Tools cannot comprehensively detect, test, or protect against the OWASP Top 10 due to the nature of several of the OWASP Top 10 risks, with reference to A04:2021-Insecure Design.")
- Does GitLab SAST detect
[insert common attack]
in[insert language of interest]
?- For example, "Does GitLab SAST cover SQLi in Python?"
- Yes, this is also a reductive summary because there are many ways this could be done, but currently we can't even easily answer "yes" to a simple question like this.
Users may ask questions like this because:
- They're doing a basic evaluation of tools on the market and need to quickly verify that we cover certain vuln types.
- Our solution should completely satisfy this use case.
- They're trying to determine if a false-negative is due to a completely missing rule, or if their specific example just wasn't targeted by our rules.
- Our solution should at least answer the first level of inquiry here.
Alternatives considered
- In-product visibility would be convenient, but is problematic because we don't have a way of tying analyzers' rules to the platform release. That is, we can have cases like:
- Old self-managed releases with new analyzer versions.
- Old (pinned) analyzer versions used in a newer self-managed release, or on GitLab.com.
- Publishing rules outside of docs could avoid some of the logistical challenges of deploying in the docs site. However:
- This would require establishing and hosting a separate site, similar to how we maintain https://advisories.gitlab.com or how Sonar maintains https://rules.sonarsource.com/.
- Content from the docs site is automatically packaged up and included in the self-managed release. This makes it possible for offline customers to have the same content.
- The SAST rules overview is in docs already.