All dependency security scanning jobs can be triggered without any supported files

Summary

Generally speaking, when we conduct the dependency scanning, only specific job can be triggered by its specific supported files, however, last week we ran into a scenario that when we conducted an maven repo, which triggered all dependency scanning jobs more than we expected in the following:

image

in fact, this repo is a pure and totally maven repo without any python or go languages files inside, so we studied the dependency template and take gemnasium-python-dependency_scanning for example, we figured out that the matched rules shows some tricks:

.gemnasium-python-shared-rule:
  exists:
    - '**/requirements.txt'
    - '**/requirements.pip'
    - '**/Pipfile'
    - '**/Pipfile.lock'
    - '**/requires.txt'
    - '**/setup.py'
    - '**/poetry.lock'

we found that if:exist rules just checks only 10000 checks, it means if the repo contains more than 10000/7=1428 files, then it will assume match the rules of supported files and will trigger all jobs in the meantime, in other words, if the repo contains more than 10000 files, whatever it's an coding repo or just text files repo, it will execute the scanning in all jobs stages

image

I made an test that created an 10001 text files repo which can triggered all jobs.

image

image

I don't think it make sense despite we can use the following variables to disable relevant job scanning like:

DS_EXCLUDED_ANALYZERS: "gemnasium,gemnasium-python" 

Why this matters and how we measure

User Stories

Proposal

at least we can output some info or message to tell customer that because your files is more than 10000 or checks more than 10000 times so for performance reason, we suggest you apply the rules to disable it..

Performance Considerations

Out of Scope

Acceptance Criteria

Additional details

Some relevant technical details, if applicable, such as:

  • Does this need a feature flag?
  • Does there need to be an associated instrumentation issue created related to this work?
  • Is there an example response showing the data structure that should be returned (new endpoints only)?
  • What permissions should be used?
  • Which tier(s) is this for?
  • Additional comments:

Implementation Table

Group Issue Link
backend 👈 You are here
frontend Issue Title
documentation Issue Title
Instrumentation Issue Title

Links/References

Edited by cheng lei