Make Gemnasium analyzer use gemnasium-db repo instead of Gemnasium API

Problem to solve

Gemnasium relies on a client/server architecture that has important limitations, operational costs, and maintenance costs. To summarize, this architecture makes it difficult and expensive to maintain the Gemnasium vulnerability database, and impossible to create a large number of new vulnerabilities - these are currently pending, and cannot be published. Though in theory it's possible to improve/fix the existing architecture to remove these blockers, the cost seems prohibitive. See https://gitlab.com/gitlab-org/gitlab-ee/issues/12930#further-details.

Intended users

groupcomposition analysis

Proposal

Make the Gemnasium Analyzer use the YAML files of the gemnasium-db repository instead of the API served on https://deps.sec.gitlab.com.

gemnasium-db becomes the Single Source of Truth, and security advisories are published as soon as their YAML files are merged into the master branch of gemnasium-db repository. The Gemnasium server (API, relational DB and services) is no longer needed.

In this new architecture, the Gemnasium Analyzer evaluates the range of affected versions to tell if a dependency is affected: it extracts the affected_range and check if the installed package version is in range, using semver or any other tool to do that. It may delegate to a Ruby script to evaluate Rubygem versions, to a PHP script to evaluate PHP Composer versions, and so on.

The gemnasium Docker image ships with a clone of gemnasium-db; it's self-contained and can scan a project without Internet connection. In a way that's similar to bundler-audit, the Gemnasium CLI exposes a --update that triggers the update of the embedded gemnasium-db, using git to do so. As a consequence, environment variable DS_DISABLE_REMOTE_CHECKS becomes irrelevant and should be deprecated.

The Secure reports generated by the Gemnasium analyzers no longer contain links to https://deps.sec.gitlab.com, but instead contain links to the YAML files of the gemnasium-db repo. The UUIDs are unchanged to avoid loosing user feedback on vulnerability (create issue or dismiss).

The dependency-scanning project is unchanged.

Permissions and Security

No change.

Documentation

update GitLab documentation !17577 (merged)

update gemnasium-db documentation gitlab-org/security-products/gemnasium-db!102 (merged)

  • update the introduction and remove It's not the Gemnasium DB itself but can be considered as a single source of truth: this is where new advisories are submitted so it's always in sync with the Gemnasium DB.
  • revisit the publishing section, even though we may eventually keep it the way it is
  • remove the uuid field from the YAML schema and the contributing guide, or simply say that it's been deprecated
  • remove paragraph saying that Gemnasium also performs checks

Testing

No change.

Implementation plan

What does success look like, and how can we measure that?

  • Security advisories are published/updated as soon as their YAML are added to/updated in gemnasium-db.
  • Gemnasium is shipped as a self-contained Docker image that doesn't need Internet connection.
  • Gemnasium can be synced with gemnasium-db at run-time, when running the gemnasium Docker image.

The following issues are solved and can be closed:

The following issues become irrelevant and can be closed:

The following issues are partially solved (gemnasium):

  • Support air-gapped (offline) Dependency Scanning for on-prem instances #12726 (closed)

Also, it drastically reduces the cost of implementing these issues:

  • Make Dependency Scanning compatible with private registries #6464
  • Add Severity level to Gemnasium vulnerabilities #8213 (closed)
  • Dependency Scanning for nuget #8102 (closed)
  • Interpreter and compiler support for Dependency Scanning #10588 (closed)

Links / references

https://gitlab.com/gitlab-org/gitlab-ee/issues/12930

Edited by Fabien Catteau