Static reachability should be able to run without requiring a docker image

Problem description

Currently Static reachability supports Java and Python and requires the user to build a docker image with the source code. An example pipeline looks like the following:

image

To be more precise gitlab-enrich-cdx-results job is running on the actual docker image built in the previous stage. The analyzer behind this job is sca-to-sarif-matcher and you can see how it works in this post.

 GitLab advanced SAST with custom SCA ruleset and sca enabled (also known as Oxeye LightZ-AIO) , runs after build in order to extract the loaded/ in use packages in customer's repository.

It seems that the analyzer requires information regarding the loaded packages.

This approach brings a limitation since users that want to use Static Reachability are required to build a docker image.

Proposal

Our current understanding is that sca-to-sarif-matcher requires at least for Python a docker image to overcome the problem where PyPi packages are named differently in the Package Manager file and in the actual code . For example PyYaml -> Yaml.

An idea could be as a first step for Beta to run the analyzer without the ability to detect those packages but without requiring to build a docker image. This could provide us with time to update the analyzer to fetch information from different means. An idea could be to download a package via a purl . Example for PyYaml. We should investigate though how to deal with different Python versions since same version packages might be different for different python versions. A possible solution to this could be to assume that the names don't change between Python versions.

Another proposal could be to start with an initial mapping. Something like

PACKAGE_IMPORT_MAPPING = {
    'PyYAML': 'yaml',
    'python-dateutil': 'dateutil',
    'beautifulsoup4': 'bs4',
    'scikit-learn': 'sklearn',
    'pillow': 'PIL',
    'opencv-python': 'cv2',
    'python-dotenv': 'dotenv',
    'setuptools': 'pkg_resources',
    'requests-oauthlib': 'requests_oauthlib',
    'python-json-logger': 'pythonjsonlogger',
    'python-jose': 'jose',
    'pyOpenSSL': 'OpenSSL',
    'pyparsing': 'pyparsing',
    'python-multipart': 'multipart',
    'typing-extensions': 'typing_extensions'
}

Why do we have this naming issue

This happens because the package name in PyPI (Python Package Index) is separate from the actual module name used in the code. The package name is used for distribution and installation, while the module name is used in the actual Python code. However, I think that this is not common to most PyPI packages. The majority of Python packages use the same name for both the package and the import.

Requirements

  • SBOM reports for non supported Package managers should be ignored
  • An SBOM report that includes dependency lists should be used. If an SBOM with only components is provided then we cannot mark transitive dependencies as in_use and they will appear as unknown

/cc @tkopel @idawson

Edited Mar 14, 2025 by Nick Ilieskou
Assignee Loading
Time tracking Loading