Component name mismatch between existing vs new analyzer

Summary

Component name mismatch between existing vs new analyzer. Advisory data (e.g., pm_affected_packages) is in alignment with the existing analyzer.

Steps to reproduce

  1. Create a project with the following .gitlab-ci.yml:
stages:
 - test

dependency_scanning:
  stage: test
  image: alpine:3.20
  script: echo "Running stubbed dependency scanning job."
  artifacts:
    access: "developer"
    paths:
      - "**/gl-sbom-*.cdx.json"
    reports:
      cyclonedx: "**/gl-sbom-*.cdx.json"
  1. Create a new MR with the following sbom file: gl-sbom.cdx.json. It contains the output of the existing analyzer.
  2. Create another new MR with the following sbom file: gl-sbom-pypi-pipcompile.cdx.json. It contains the output of the new analyzer.

Example Project

Please see the steps described above.

What is the current bug behavior?

Only the pipeline related to (2) displays security findings.

What is the expected correct behavior?

Pipelines related to both (2) and (3) should display similar security findings.

Relevant logs and/or screenshots

Extract from the cyclonedx report related to the new analyzer:

  {
      "name": "django",
      "version": "1.11.4",
      "purl": "pkg:pypi/django@1.11.4",
      "type": "library",
      "bom-ref": "pkg:pypi/django@1.11.4",
      "properties": [
        {
          "name": "reachability",
          "value": "not_available"
        }
      ]
    },

Extract from the cyclonedx report related to the current analyzer:

    {
      "name": "Django",
      "version": "1.11.4",
      "purl": "pkg:pypi/Django@1.11.4",
      "type": "library",
      "bom-ref": "pkg:pypi/Django@1.11.4",
      "properties": [
        {
          "name": "gitlab:dependency_scanning_component:reachability",
          "value": "unknown"
        }
      ]
    },

DB entries:

gitlabhq_dblab=# select count(distinct(pm_advisory_id)) from pm_affected_packages where package_name = 'Django';
 count
-------
   129
(1 row)

gitlabhq_dblab=# select count(distinct(pm_advisory_id)) from pm_affected_packages where package_name = 'django';
 count
-------
     0
(1 row)

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of: \\\`sudo gitlab-rake gitlab:env:info\\\`) (For installations from source run and paste the output of: \\\`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production\\\`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: \`sudo gitlab-rake gitlab:check SANITIZE=true\`) (For installations from source run and paste the output of: \`sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true\`) (we will only investigate if the tests are passing)

Possible fixes

At a first glance, it seems that we can either migrate (normalizing it) the existing package metadata to conform with the new standard or update the new analyzer to keep the current behavior.

Normalising on PMDB

License flow

  • Update interfacer so that Python packages are normalised correctly. More specifically consequtive - should be replaced by just 1 -. This will fix the PyPi name normalization for licenses.
  • Deploy interfacer on dev and prod
  • Run prod pypi feeder with IGNORE_CURSOR to generate the new names
  • Run prod pypi exporter v2 to export the new names

Advisory flow

  • Update exporter so that it normalises the PyPi names while exporting data.
  • Deploy exporter on dev and prod
  • Run prod exporter job only for REGISTRY=pypi and EXPORT_ALL=true.

Verify the results

  • [] For licenses you can
  • [] For advisories you can make sure that PyPi names are lowercased and contain only one consecutive -. For example Django advisories

Normalising on PMDB side is not the way to go. Read more about this in this thread.

Implementation Plan (based on this thread)

  • backend Use normalized package_name during pm_affected_package ingestion and on CVS: Use normalized package_name during `pm_affected... (!183995 - merged) • Zamir Martins • 17.10
  • database Normalize pm_affected_packages.package_name: Normalize pm_affected_packages.package_name (!183732 - merged) • Zamir Martins • 17.11
Edited Mar 10, 2025 by Zamir Martins
Assignee Loading
Time tracking Loading