Assess accuracy of semver_dialects using gemnasium/vrange

Problem to solve

We'd like to use semver_dialects to implement Continuous Scanning in the backend, and to support all the languages and package managers Gemnasium (Dependency Scanning) supports today. However, semver_dialects might not be as accurate as Gemnasium's vrange library, and this might result in false positives and false negatives. See #363073 (comment 970572115)

Proposal

Leverage gemnasium-db to compare semver_dialects to vrange, report discrepancies, and create follow-up issues to address them.

For each YAML file of gemnasium-db, do the following:

  1. Query the package registry to list package versions.
  2. Find affected, fixed, and non-affected versions using gemnasium/vrange.
  3. Find affected, fixed, and non-affected versions using semver_dialects.
  4. Report inconsistencies.

We can either query the API of the package registry or use the package manager to list the versions of a package.

To find the affected and non-affected versions, go through the versions listed on the package registry, and evaluate the affected_range of the YAML file.

Also, find package versions that match the fixed_versions of the YAML file.

See #220286 (comment 1044076127)

Implementation plan

The proposal above is likely the best way to do an assessment of the semver_dialects gem. The solution will also allow our team to facilitate Ensure consistency b/w semver_dialects and vran... (#369239).

  • add vrange command to gemnasium allowing queries given a range and a version
  • create a project for running this test
    • ruby optimal here since semver_dialects is a ruby gem and this matching capability will be used in the rails monolith
  • check out advisory db
  • iterate over each package manager directory (e.g. pypi, gem)
    • iterate over each directory corresponding to a package
      • get list of official versions for package (to simplify configuration try to use docker as much as possible; this can be used in Ensure consistency b/w semver_dialects and vran... (#369239) under docker-dind image)
      • iterate over each advisory
        • check version of package against advisory.affected_range
          • using vrange
          • using semver_dialects
        • capture matches, mismatches and which range functionality they belong to using a greppable format
          • capture to db that can be saved to disk and queries via jq (for example)
  • commit code used to generate this check to make it repeatable for new semver types - https://gitlab.com/gitlab-org/security-products/tests/semver-assessment
  • report results - #369238 (comment 1114975584)
  • create follow-up issues

/cc @julianthome @bwill

Edited by Oscar Tovar