Skip to content

Bump Sempgrep version

Mark Florian requested to merge upgrade-semgrep into main

What does this MR do and why?

The main purpose of this MR is to upgrade to the latest version of Semgrep. The reason this MR is so large is that Semgrep 1.45.0 changed how semgrep --test works, so we had to rewrite our tests. From 3d469c44:

This is the start of work needed to upgrade to Semgrep>=1.45.0. The idea is to ensure a one-to-one relation between the YAML file and its test file.

Until now, we've had a one-to-many relationship between YAML rule files and test files. For instance, we have many rules which target both .vue and .js files, or both .haml and .rb files. This looked like this:

rules/components/accordion/
├── accordion.haml
├── accordion.vue
└── accordion.yml

With https://github.com/semgrep/semgrep/pull/8993, Semgrep started ignoring the paths directive of rules, so we could no longer structure our tests this way, as both the HAML and Vue files were tested against all the rules in accordion.yml, resulting in failures.

The approach from now on is to have one test file (based on the base file name) for each YAML rule file. This now looks like this:

rules/components/accordion/
├── accordion-haml-rb.haml
├── accordion-haml-rb.yml
├── accordion-vue-js.vue
└── accordion-vue-js.yml

Now, the -vue-js.yml file only contains rules that target .vue or .js files. Same for the -haml-rb.yml file. If a rule exists which uses a language-specific parser, like javascript or ruby, that should live in its own YAML file, with its own specific test file.

Since Vue files contain JavaScript anyway, we can move all JavaScript tests into Vue files. Same for HAML and Ruby. There's no need to duplicate these in suffix-specific files, which would amount to testing Semgrep's path inclusion logic anyway.

See https://github.com/semgrep/semgrep/issues/9364 for context.

Commits

Bump Sempgrep version

Run compare findings job on CI changes

Be verbose in test output

All the test change commits omitted, since they're all very similar to 3d469c44

Upgrade Semgrep again

Some of the more notable/relevant changes we're getting:

1.67.0:

Logged in users running semgrep ci will now run the pro engine by default.

This means we now pass --oss-only to semgrep ci, just as a precaution.

1.66.1:

we restored bash, jq, and curl in our semgrep docker image as some users were relying on it. We might remove them in the futur but in the mean time we restored the packages and if we remove them we will announce it more loudly. We also created a new page giving more information about our policy for our docker images: https://semgrep.dev/docs/semgrep-ci/packages-in-semgrep-docker/

1.66.0:

The official semgrep docker image does not contain anymore the bash, jq, and curl utilities, to reduce its attack surface. (saf-861)

1.65.0:

Removed the extract-mode rules experimental feature. (extract_mode).

We're not actually using this mode, so it's not a huge loss, but it might have been useful.

1.45.0:

Change test inclusion/exclusion behaviour, https://github.com/returntocorp/semgrep/issues/8192.

This is the cause of the many test changes in previous commits.

1.41.0:

More Ruby parser fixes

1.39.0:

New Ruby parser

Test/integration MR

gitlab-org/gitlab!149088 (closed)

Review/run this locally

  1. Copy the pages in mr job URL (must have succeeded)
  2. Run bin/review-mr.sh <job url>
  3. If there are rules changes, check the compare findings job log reports the expected changes
  4. If there are group changes, check the compare groups job log reports the expected changes
Edited by Mark Florian

Merge request reports