Improve Scala ruleset coverage
What this MR does?
-
Enhance use-case coverage for Scala rules in comparison to the native analyzer(Spotbugs/FindSecBugs) - reference job result containing the improvements.
-
Remove redundant mapping of
find_sec_bugs
in the Scala mapping file -
Introduce
native_analyzer
property in the mappings file and use it on the primary ID prefix instead of the mapping file's name.Two Reasonings:
- Currently, there is a 1:1 mapping between semgrep-ruleset and its associated native analyzer for every language so the name of the mapping file was set to the native analyzer and is used as a prefix in the primary ID generation. However, Spotbugs/FindSecBugs analyzer breaks this approach since it supports multiple languages(Java/Scala/Kotlin/Groovy) for almost the same set of rules and semgrep cannot support this as the rulesets change per-language basis. So, we introduced a new property
native_analyzer
in all the mappings files, representing the native analyzer. For ex: scala rulesets are added infind_sec_bugs_scala.yml
mapping file and it also containsfind_sec_bugs
as the value fornative_analyzer
property. - With Spotbugs analyzer's nature of treating all the languages as the same, all the vulnerabilities will have
find_sec_bugs
prefixes in their primary ID. So, the Semgrep analyzer should maintain the same prefix (find_sec_bugs
) in its PrimaryID for all the Spotbugs-supported languages. This ensures there would not be any duplication in findings. Withnative_analyzer
property introduction, we generate all the semgrep rulesets with this property's value as the prefix in the primary ID instead of mapping file's name.
- Currently, there is a 1:1 mapping between semgrep-ruleset and its associated native analyzer for every language so the name of the mapping file was set to the native analyzer and is used as a prefix in the primary ID generation. However, Spotbugs/FindSecBugs analyzer breaks this approach since it supports multiple languages(Java/Scala/Kotlin/Groovy) for almost the same set of rules and semgrep cannot support this as the rulesets change per-language basis. So, we introduced a new property
-
gap-analysis job results for the new ruleset changes: reference. Note that use-case coverage for some rules might seem incomplete despite they are not because they are matching the vulnerability pattern at different line numbers. You can find more details in the progress tracker issue.
Relevant Issues:
Edited by Vishwa Bhat