Implement the Dependency graph spec of the CycloneDX format in Gemnasium.
This approach has several upsides:
This allows to possibly leverage existing tools that already report this data in this format rather than doing it from scratch in gemnasium. E.g. Trivy already provides that information by following this spec. This will be closely related to the outcome of Spike: Replace Gemnasium with open source nativ... (#434143).
This gets us closer to supporting SBOM reports generated by 3rd parties and other custom jobs (there are other blockers for this though).
This helps supporting future features based on the dependency graph information (e.g. dependency graph visualization, showing other ancerstor paths, knowing critical components heavily depended upon within a project or group/company, etc.)
While this format allows the backend to generate a full dependency graph and do a lot of things, the first iteration could be simpler and only on par with the currently provided dependency_path - which is actually one arbitrary selected "shortest path" of ancestors. The rails platform will be free to evolve at its own pace without requiring changes on the SBOM report.
The full dependency graph could be generated asynchronously, after the SBOM ingestion. This would limit the impact on performance during SBOM ingestion and provide more flexibility to what we want to store. For instance, today only vulnerable dependencies have the dependency_path displayed. By doing the dependency graph async, we can know which component has associated vulnerabilities and keep the scope limited to these ones (to limit storage usage if this is still a concern).
Implementation plan
Dependency graph information can be generated directly using scanner.File.Dependencies
because the cyclonedx can generate a purl and a sbom-ref directly from these.
There's no conflict b/c Gemnasium generates one CDX SBOM per input scanner.File.
It doesn't useful to process the internal dependency graph that's currently used to generate the .dependency_files field of the Dependency Scanning report.
If so, what about marking this issue (add dependency graph info to SBOM) as blocked by the two other issues?
Also, we should probably update these two other issues, to explicit say that adding the dependency graph to the CycloneDX SBOM is out of scope, and that it will be handled in a separate issue. WDYT?
Would we do that after the issues about creating SBOM generators for Nuget, Yarn, and Sbt
Yes, I think we should implement the SBOM generators first and then add dependency graph information afterwards.
If so, what about marking this issue (add dependency graph info to SBOM) as blocked by the two other issues?
done
Also, we should probably update these two other issues, to explicit say that adding the dependency graph to the CycloneDX SBOM is out of scope, and that it will be handled in a separate issue. WDYT?
@adamcohen Related to this, do we have issues about updating the Rails backend so that it extracts graph information from the CylconeDX SBOM instead of the existing Dependency Scanning report, and then removing this graph from the DS report (3-step migration)?
Maybe we could add these issues to Show paths to dependencies (&7530), even though this is more like a technical migration, and not so much about showing dependency paths.
@thiagocsf We could do something similar for Category:Container Scanning actually. If we were to add dependency graph information to CycloneDX SBOMs generated by Container Scanning, then the Dependency List eventually would show dependency paths to system-level dependencies the same way it shows dependency paths to application-level dependencies, for instance.
@fcatteau, does a container image typically have enough information to populate the dependency graph?
The scanners we integrated into the CS analyzer (Trivy, Grype) can read both the OS package and language-specific databases. The package databases have dependency information that can be extracted, but can we determine the triggered path based on a layer snapshot?
(I'm missing domain expertise in DS, so it's possible my question is not even relevant)
@thiagocsf Great questions! Let's consider Debian.
AFAIK there isn't a file that gives the dependency graph. So we would have to extract the Depends fields from package metadata (Debian control file), and build a graph. I guess this is what apt-get install does to install the dependencies.
We shouldn't use apt-cache depends to list package dependencies b/c that would be too slow, and because it probably connects to Debian repos; this wouldn't work in offline environments. We can't rely on /var/cache/apt or /var/lib/apt/lists, because these directories might be cleaned up after installing the packages, to save space. And yet, the information must be available somewhere, because apt-get remove can figure out the dependencies that need to be removed even when both /var/cache/apt and /var/lib/apt/lists are empty, and it doesn't seem to access the Debian repos – to be checked.
can we determine the triggered path based on a layer snapshot?
We can use apt-mark showmanual to list the packages that have been installed manually. This works even after cleaning up /var/cache/apt, but it no longer works after cleaning up /var/lib/apt.
In conclusion, it seems possible to track the dependencies up to the packages that have been installed explicitly, but not if /var/lib/apt is cleaned up when building the image. None of that seems easy though.
IMO this would be something to look into for Lovable. Since DS is slightly different, and Sam is acting PM for both groups, maybe he'll have a different opinion.
@sam.white something to keep in our back pockets in case you see any customer demand for this in CS
@thiagocsf I agree this is not a near-term priority. As long as we leave the door open for the possibility of implementing it in the future, we do not need to do anything else here for now.
CylconeDX doesn't track the type of a dependency, and if we only rely on CylconeDX dependencies then we can't distinguish dev dependencies, to begin with.
Dependencies have no metadata and it's not possible to distinguish development dependencies from production ones. However, dependency metadata could be supported by extending the CycloneDX format.
A workaround would be to extend the JSON schema for CylconeDX to support this. In parallel we should contact the CylconeDX team to work on making that part of the official specification.
That said, we can leverage the properties of the SBOM component to collect all the scopes/groups in which a component is used. We would flatten the graph in a way.
CylconeDX doesn't track the type of a dependency, and if we only rely on CylconeDX dependencies then we can't distinguish dev dependencies, to begin with.
@fcatteau It seems like this is out of scope for this issue, since we don't currently report this information in the gl-dependency-scanning-report.json file. What do you think about creating another issue to handle tracking dependency types, which can be implemented after this issue?
@adamcohen Yes, we definitely need another issue about tracking dependency types in the CycloneDX SBOMs generated by Gemnasium. I have to create one. TODO
@adamcohen I've assigned this issue to %15.4 milestone, I've done it a bit randomly knowing that we have already some issues in this epic assigned for %15.3. Could you please verify the order of issues in &8206 (closed) so issues are in the order in which they should be addressed? And could you please reassign this issue to %15.3 if needed?
Adam Cohenchanged the descriptionCompare with previous version
Problem they are trying to solve: Customer wants to distinguish between direct and transient dependencies, however the CycloneDX SBOM format doesn’t have any attribute to mark transitive dependencies (Java SBOMs).
Current solution for this problem: They will look into introducing their own SBOM tool
Impact to the customer of not having this: Customer ideally would like to use GitLab's SBOM tool but without this they're unable to have everything on one platform and prompts the need to introduce another tool
Questions:
Do we have any plans to address this in the near term?
@gonzoyumo, once this information is available on the SBOM report, what do we need on the rails side to use it? E.g.: ingest to sbom tables, add finder(s), use data on existing controller(s)
@thiagocsf the dependency graph is the result of a dependency resolution contextual to a given project, this can't be done in the Package Metadata DB.
So this has to be handled in the SBOM generator which currently is Gemnasium. Maybe there is an opportunity to do a quick review of the SBOM generators we want to replace Gemnasium with and if they provide such informaton it might be quicker to swap them with our current implementation than doing it ourselves manually?
From the rails side, the data must be stored during the ingestion like other component data contextual to this project and this particular scan execution. Currently the dependency_path is an ordered list of references to other components reported in the same SBOM document, which uses a iid (int) scoped to that document. That's pretty simple but in the context of the DB storage I don't think we have something similar to this iid and we'll have to find a different approach to efficiently keep track of these relations between components.
After discussing with the team here is an updated proposal.
CA implements the dependency graph spec of CycloneDX format which looks straightforward and should be doable within a milestone.
TI implements the parsing and storage of this information in the rails backend. This part might take more work but several options are available depending on capacity and product goal for the first iteration.
This approach has several upsides:
This allows to possibly leverage existing tools that already report this data in this format rather than doing it from scratch in gemnasium. E.g. Trivy already provides that information by following this spec. This will be closely related to #434143 as Fabien mentionned.
This gets us closer to supporting SBOM reports generated by 3rd parties and other custom jobs (there are other blockers for this though).
This helps supporting future features based on the dependency graph information (e.g. dependency graph visualization, showing other ancerstor paths, knowing critical components heavily depended upon within a project or group/company, etc.)
While this format allows the backend to generate a full dependency graph and do a lot of things, the first iteration could be simpler and only on par with the currently provided dependency_path - which is actually one arbitrary selected "shortest path" of ancestors. The rails platform will be free to evolve at its own pace without requiring changes on the SBOM report.
The full dependency graph could be generated asynchronously, after the SBOM ingestion. This would limit the impact on performance during SBOM ingestion and provide more flexibility to what we want to store. For instance, today only vulnerable dependencies have the dependency_path displayed. By doing the dependency graph async, we can know which component has associated vulnerabilities and keep the scope limited to these ones (to limit storage usage if this is still a concern).
@zmartins yes it's matching the CycloneDX dependency graph spec.
One thing I've caught in Trivy's CycloneDX report is that there could be inconsistencies about what value is used for the bom-ref (components) and ref (dependencies) properties. The spec calls for using something unique within the context of the document, not for it to be consistant ^^ As long as you can match a depencency with a corresponding component there should be no issue but just something to be aware of.
Doing this issue in 16.10 sounds ambitious but we can probably give it a shot in 16.11. In any case I think the implementation depends on the outcome of Spike: Replace Gemnasium with open source nativ... (#434143) so we might want to move this forward if we can. The goal is to deliver in time to ship the whole feature in 17.0 to replace the existing dependency_path that will be removed.
The goal is to reach parity with the DS report, and it seems unlikely that we find existing CDX generators that provide dependency graphs for Nuget, yarn, sbt, and conan. Then we would have to contribute dependency graph generation to some of these generators, which would take time.
Switching to SBOM generators other than Gemnasium by the end 16.11 doesn't seem realistic. We don't even have issues for that actually; we'll create them as part of the spike issue.
Adding dependency graph to Gemnasium makes the issue actionable. We have a clear scope, and the issue can be refined.
I'm changing the issue accordingly. Please object if you disagree.
We might introduce an environment variable to disable that feature, if the SBOM size becomes a problem. I wouldn't do it though because CI/env variable might be expensive to document, test, and maintain.
I don't see any documentation task for this. Do you?
I think we'd rather update the doc once we have the rails counterpart implemented. The user facing change is more likely to justify a doc update.
We might introduce an environment variable to disable that feature, if the SBOM size becomes a problem. I wouldn't do it though because CI/env variable might be expensive to document, test, and maintain.
I would not do it neither. We can still add this later if this becomes necessary.
Fabien Catteauchanged title from Add dependency graph information to CycloneDX SBOMs to Add dependency graph information to Gemnasium's CycloneDX SBOMs
changed title from Add dependency graph information to CycloneDX SBOMs to Add dependency graph information to Gemnasium's CycloneDX SBOMs