Skip to content

Build Docker image with r2c-hosted ruleset instead of local rules

mschwager requested to merge mschwager/semgrep:mschwager-r2c-ruleset into main

TL;DR r2c would like to utilize rules provided by our systems as the single source of truth for this analyzer.

What does this MR do?

Hi all,

I'm posting this MR early to get quick feedback before proceeding with this approach. The r2c team is still working on the relevant rulesets that this MR will download. I'll update this MR as additional development occurs.

The goal of this MR is to reconcile r2c's and GitLab's rules into rulesets that we can both utilize, while also adhering to our licensing requirements. My ask is that you review this MR, make sure it meets your expectations for analyzers, and give the general approach, but not necessarily the current ruleset, a 👍 so we can quickly proceed once the correct ruleset is in place.

This MR is operating under the following assumptions:

  1. Ruleset hash pinning is not yet implemented (e.g. /p/semgrep-sast@123abc). This functionality is unlikely to make it in before GitLab's 14.0 release. Because of this we will manually ensure that the rulesets we provide to this analyzer do not change until we have hash pinning in place, then we can enforce automatic verification.
  2. We are baking the rulesets into the Docker image at build time and not leveraging their URLs at analyzer run time. This will mitigate reliability concerns and allow for running the Docker images in air-gapped environments.
  3. Rule IDs will now be prepended with gitlab. (e.g. bandit.B101 -> gitlab.bandit.B101). This introduces some backwards-compatibility concerns. We have some flexibility here, but we feel this will give the best user experience across our two systems.

From a technical standpoint, we will be making the following changes to our systems to fulfill the above requirements:

  • We will be ingesting your custom written rules from the rules/ directory in this repository.
  • We will be storing our shared rules in the gitlab/ directory of the semgrep-rules repository.
  • We will then combine the above rules into two rulesets: /p/gitlab-bandit and /p/gitlab-eslint.
  • Finally, we will download the YAML from the above rulesets and bake it into the analyzer Docker image at build time.

The code associated with this MR gives a rough outline of what the final changes will look like. Once everything is settled with the new approach we can remove the shared rules from this repository, which should make de-duplication easier on our end. In the future we will also plan to enable ruleset hash pinning for extra assurance.

Does this seem like a reasonable approach? Let me know if you have any questions or concerns!

What are the relevant issue numbers?

Does this MR meet the acceptance criteria?

Edited by mschwager

Merge request reports