Backport Gemnasium 2.x to GitLab 10.7 up to 11.5

Summary

In the context of #14630 (closed), it's necessary to back-port Gemnasium 2.x to all versions of GitLab starting from 10.7, when Dependency Scanning was first introduced. See #12930 (comment 213498000)

From GitLab 10.7 up to GiLab 11.5, Dependency Scanning was a Ruby project (see 11-5-stable branch of the dependency-scanning project). It became a Go project in 11.6, and back-porting Gemnasium 2 to 11.6 is out of scope of this issue.

Further details

See #12930 (comment 213498000)

Dependency Scanning for GitLab 10.7

GitLab documentation says that Dependency Scanning was Introduced in GitLab Ultimate 10.7 though truly DS was introduced in 10.5, and 10.7 is when it was extracted out of a SAST+DS project. See https://gitlab.com/gitlab-org/gitlab-ee/issues/5105.

Looking at the source code, not much happened between 10.7 and 11.5, and this confirmed by the changelog.

Gemnasium existed as an analyzer plugin in what used to be a Ruby project. See lib/analyzers/gemnasium.rb. The code is small and it's tested using RSpec.

The Gemnasium plugin downloads and runs gemnasium-client v1.0.1, then converts its output.

Gemnasium is disabled when either SAST_DISABLE_REMOTE_CHECKS or DEP_SCAN_DISABLE_REMOTE_CHECKS is disabled. That wouldn't change. See lib/analyze.rb.

There's a warning when there's no /var/run/docker.sock Unix socket, saying that Gemnasium won't be able to scan Python and Maven projects. That wouldn't change. See lib/technologies.rb.

The gemnasium-client CLI parses all the dependency files it finds, builds a list of packages, and ultimately queries the Gemnasium API to get a vulnerability list. If a Python project is detected, it delegates the generation of pipdeptree.json and then parses the file. Same goes for Maven.

gemnasium/client contains the Dockerfile used to generate the Gemnasium Python and Gemnasium Maven images. The project is no longer maintain.

We moved away from this architecture because it was too complex, thus hard to maintain. For that reason I recommend we don't do any back-port in gemnasium/client, but archive this project instead.

Documentation

Job definition

Here's the job definition in GitLab EE 10.7:

dependency_scanning:
  image: docker:stable
  variables:
    DOCKER_DRIVER: overlay2
  allow_failure: true
  services:
    - docker:stable-dind
  script:
    - export SP_VERSION=$(echo "$CI_SERVER_VERSION" | sed 's/^\([0-9]*\)\.\([0-9]*\).*/\1-\2-stable/')
    - docker run
        --env DEP_SCAN_DISABLE_REMOTE_CHECKS="${DEP_SCAN_DISABLE_REMOTE_CHECKS:-false}" \
        --volume "$PWD:/code" \
        --volume /var/run/docker.sock:/var/run/docker.sock \
        "registry.gitlab.com/gitlab-org/security-products/dependency-scanning:$SP_VERSION" /code
  artifacts:
    paths: [gl-dependency-scanning-report.json]

The back-port

Dependency Scanning 11.5 seems to be backward compatible with 10.7: the 3 minor changes introduced between the 2 versions could be back-ported to 10.7. So we could maintain a single back-port, and publish it as 10-7-stable, 10-8-stable, and so on, up to 11-5-stable. This is good news.

Most of dependency-scanning@11-5-stable doesn't have to change, except for the gemnasium analyzer plugin implemented in lib/analyzers/gemnasium.rb. The plugin would no longer rely on the gemnasium-client, but use the analyzer projects based on Gemnasium instead:

Using the docker CLI, gemnasium.rb would start the gemnasium-python image, wait for the container to finish, and retrieve the JSON report generated by the container. Same goes for gemnasium-maven. As before, DinD is required to process Python and Maven project; the requirements don't change.

gemnasium can't be run using Docker because this would change the requirements (right now DinD is not required for projects Gemnasium can handle directly). So we would need to release a binary for analyzer/gemnasium, and make it available to the legacy Dependency Scanning project. The binary could be an artifact generated by the pipeline of the gemnasium project, and downloaded at run-time by the lib/analyzers/gemnasium.rb, just like before the back-port. But we could also improve that and build the binary when building DS, and add it to the dependency-scanning Docker image.

The output_to_issues method would be updated to process the JSON reports generated by Gemnasium, Gemnasium Python, and Gemnasium Maven.

The back-port uses to the latest versions of the Gemnasium-based projects, and can be maintained in the long term.

Implementation plan

  • prepare QA (non-regression tests)
    • identify test projects compatible with Dependency Scanning 11.5
    • create QA pipeline for DS 11.5 for the test projects, in git branch dep-scan-11-5-stable
    • use the test projects to compare DS 10.7 (or 10.5) with DS 11.5, and check backward compatbility
  • back-port Gemnasium 2 to Dependency Scanning 11.5
    • create a v0 branch from 11-5-stable
    • convert new, generic Dependency Scanning reports
    • embed Gemnasium binary, gemnasium-db repo, and vrange library in Docker image
    • integrate Gemnasium CLI to scan projects that are not Maven or Python projects
    • integrate gemnasium-maven Docker image to scan Maven projects
    • integrate gemnasium-python Docker image to scan Python projects
  • update the CI config
    • make it build, test, and tag the image
    • trigger the pipeline of the test projects, using the dep-scan-11-5-stable branches
    • generate dependency-scanning:10-7-stable up to 11-5-stable

Improvements

  • User benefit from latest vulnerabilities published on gemnasium-db.
  • The Gemnasium Server can be shut off.
  • There's single source of truth: gemnasium-db.

Testing

Non-regression QA jobs have been created for this test projects:

To run QA for Dependency Scanning (DS) 11.5, simply trigger a pipeline for the dep-scan-11-5-stable branch. To compare with DS 10.5, trigger a pipeline for the same branch, but force DS_VERSION to 10-5-stable. To run non-regression tests, run the same pipeline and set DS_VERSION to the tag of the dependency-scanning Docker image to be tested.

Risks

  • Broken CI dependency_scanning jobs on all versions of GitLab.
  • Broken Security widgets because the output format has changed.

Involved components

https://gitlab.com/gitlab-org/security-products/dependency-scanning/

Optional: Intended side effects

The back-port supports projects not previously supported, resulting in a behavior change.

Optional: Missing test coverage

Ideally this should be tested with GitLab 10.7 up to 11.5 (full E2E integration tests) but the plan is to only check the generated reports.

/cc @gonzoyumo @NicoleSchwartz

Edited by Fabien Catteau