Draft: Add service to match SBOM components and vulns
What does this MR do and why?
This MR adds a service to match SBOM components and advisories. It exposes a single public method Gitlab::VulnerabilityScanning::PackageAdvisories#fetch which takes as input an array of SBOM component which are Hashie::Mash objects containing the following fields:
purl_typeversionname
The fetch method searches through the pm_affected_packages table in batches of 700 records for entries that match the given purl_type and name, and joins against the pm_advisories table. It retrieves the following fields from both of these tables:
- pm_advisories.title,
- pm_advisories.description,
- pm_advisories.cvss_v2,
- pm_advisories.cvss_v3,
- pm_advisories.urls,
- pm_advisories.identifiers,
- pm_affected_packages.purl_type,
- pm_affected_packages.package_name,
- pm_affected_packages.solution,
- pm_affected_packages.affected_range
Benchmarks
As discussed in this thread, we need a way to keep track of the component version values. Two options were discussed:
- using a CTE (implemented in this MR)
- using an in-memory hash
The benchmarks below include results for both the CTE method and the in-memory hash for the sake of comparison, however, the numbers show that a CTE is a more efficient approach, which has been implemented in this MR.
To understand how these benchmarks compare to existing code, please see the MR for updating package metadata license lookup to use compressed data.
You'll notice that the benchmarks for updating package metadata license lookup to use compressed data are much faster, at 0.550493 s for fetching 5000 compressed records versus 13.587334 s in this MR for fetching 5000 components where 4000 of them have affected packages (a very unlikely situation). However, the code for updating package metadata license lookup to use compressed data is user-facing, so we've tried to make it as fast as possible, whereas the code in this MR is not user-facing, since it'll be kicked off by a sidekiq job in the background, so slower response times are more acceptable.
-
1000 affected components out of 5000 components
Click to expand
Benchmark results with CTE version lookup with components containing 1000 affected packages =================================================================== user system total real 100 records 0.138171 0.005587 0.143758 ( 0.161953) 200 records 0.183474 0.005838 0.189312 ( 0.222943) 300 records 0.298285 0.008604 0.306889 ( 0.348068) 400 records 0.382594 0.009639 0.392233 ( 0.441390) 500 records 0.486761 0.015195 0.501956 ( 0.559242) 600 records 0.631321 0.019927 0.651248 ( 0.722986) 700 records 0.655312 0.016848 0.672160 ( 0.752760) 800 records 0.835680 0.018616 0.854296 ( 0.950211) 900 records 0.922340 0.025124 0.947464 ( 1.059607) 1000 records 0.829426 0.220630 1.050056 ( 1.171212) 2000 records 1.614828 0.048416 1.663244 ( 1.892707) 3000 records 2.548333 0.075535 2.623868 ( 2.986696) 4000 records 3.420843 0.099478 3.520321 ( 3.990201) 5000 records 4.427780 0.122395 4.550175 ( 5.139824) =================================================================== Benchmark results with in-memory hash version lookup with components containing 1000 affected packages =================================================================== user system total real 100 records 0.204993 0.006792 0.211785 ( 0.229392) 200 records 0.205858 0.008854 0.214712 ( 0.238709) 300 records 0.298505 0.008785 0.307290 ( 0.339396) 400 records 0.362667 0.004269 0.366936 ( 0.405912) 500 records 0.435338 0.009480 0.444818 ( 0.489894) 600 records 0.554225 0.014166 0.568391 ( 0.620221) 700 records 0.615241 0.015856 0.631097 ( 0.690592) 800 records 1.102302 0.024933 1.127235 ( 1.198590) 900 records 1.311307 0.026613 1.337920 ( 1.424326) 1000 records 1.724592 0.042284 1.766876 ( 1.862162) 2000 records 4.452901 0.099234 4.552135 ( 4.733542) 3000 records 10.395869 0.221649 10.617518 ( 10.885542) 4000 records 18.042436 0.660988 18.703424 ( 19.063492) 5000 records 28.101127 1.092482 29.193609 ( 29.654618) =================================================================== -
2000 affected components out of 5000 components
Click to expand
Benchmark results with CTE version lookup with components containing 2000 affected packages =================================================================== user system total real 100 records 0.229686 0.017773 0.247459 ( 0.271600) 200 records 0.401965 0.014219 0.416184 ( 0.457788) 300 records 0.558023 0.017265 0.575288 ( 0.625430) 400 records 0.661989 0.022778 0.684767 ( 0.748146) 500 records 0.786053 0.188283 0.974336 ( 1.049490) 600 records 0.895979 0.027676 0.923655 ( 1.022674) 700 records 1.156021 0.036124 1.192145 ( 1.309637) 800 records 1.284261 0.041253 1.325514 ( 1.462144) 900 records 1.382215 0.042595 1.424810 ( 1.576426) 1000 records 1.467692 0.044432 1.512124 ( 1.672537) 2000 records 2.827644 0.084353 2.911997 ( 3.230935) 3000 records 4.428484 0.441466 4.869950 ( 5.359017) 4000 records 5.785917 0.167348 5.953265 ( 6.587623) 5000 records 7.490965 0.223624 7.714589 ( 8.532240) =================================================================== Benchmark results with in-memory hash version lookup with components containing 2000 affected packages =================================================================== user system total real 100 records 0.320153 0.010363 0.330516 ( 0.345818) 200 records 0.428199 0.016496 0.444695 ( 0.470798) 300 records 0.497978 0.013736 0.511714 ( 0.543679) 400 records 0.656134 0.018508 0.674642 ( 0.713427) 500 records 0.810513 0.020124 0.830637 ( 0.875123) 600 records 0.961890 0.022576 0.984466 ( 1.035953) 700 records 1.168881 0.026746 1.195627 ( 1.254277) 800 records 1.671087 0.039767 1.710854 ( 1.783577) 900 records 2.324100 0.052157 2.376257 ( 2.459570) 1000 records 2.578770 0.064218 2.642988 ( 2.732034) 2000 records 8.064514 0.176205 8.240719 ( 8.411064) 3000 records 19.925639 0.434941 20.360580 ( 20.622384) 4000 records 33.924963 1.198814 35.123777 ( 35.468715) 5000 records 56.534889 2.194605 58.729494 ( 59.169322) =================================================================== </details> -
3000 affected components out of 5000 components
Click to expand
Benchmark results with CTE version lookup with components containing 3000 affected packages =================================================================== user system total real 100 records 0.370768 0.013872 0.384640 ( 0.413206) 200 records 0.559093 0.014545 0.573638 ( 0.619264) 300 records 0.640525 0.139878 0.780403 ( 0.836791) 400 records 0.777679 0.025087 0.802766 ( 0.887398) 500 records 0.932292 0.027473 0.959765 ( 1.061200) 600 records 1.033040 0.030242 1.063282 ( 1.180928) 700 records 1.246824 0.038131 1.284955 ( 1.424180) 800 records 1.348300 0.034815 1.383115 ( 1.536389) 900 records 1.694583 0.050465 1.745048 ( 1.920516) 1000 records 2.022909 0.056121 2.079030 ( 2.267054) 2000 records 3.602560 0.367170 3.969730 ( 4.371499) 3000 records 5.812396 0.177475 5.989871 ( 6.601697) 4000 records 7.624252 0.850542 8.474794 ( 9.290588) 5000 records 9.070385 0.298463 9.368848 ( 10.401297) =================================================================== Benchmark results with in-memory hash version lookup with components containing 3000 affected packages =================================================================== user system total real 100 records 0.328865 0.010059 0.338924 ( 0.355052) 200 records 0.512301 0.012661 0.524962 ( 0.551092) 300 records 0.704912 0.016913 0.721825 ( 0.754790) 400 records 1.099221 0.033035 1.132256 ( 1.172481) 500 records 1.164526 0.036386 1.200912 ( 1.257501) 600 records 1.342235 0.035431 1.377666 ( 1.431948) 700 records 1.572490 0.034914 1.607404 ( 1.667010) 800 records 1.945943 0.046310 1.992253 ( 2.062189) 900 records 3.525004 0.102298 3.627302 ( 3.711187) 1000 records 3.966712 0.093647 4.060359 ( 4.150162) 2000 records 12.804322 0.291073 13.095395 ( 13.263047) 3000 records 33.066726 0.716707 33.783433 ( 34.065425) 4000 records 56.166261 2.899048 59.065309 ( 59.409173) 5000 records 89.607063 4.136758 93.743821 ( 94.247169) =================================================================== </details> -
4000 affected components out of 5000 components
Click to expand
Benchmark results with CTE version lookup with components containing 4000 affected packages =================================================================== user system total real 100 records 0.509272 0.016921 0.526193 ( 0.557925) 200 records 0.721374 0.161824 0.883198 ( 0.938143) 300 records 1.060313 0.033536 1.093849 ( 1.194125) 400 records 1.321734 0.035485 1.357219 ( 1.480358) 500 records 1.389149 0.038384 1.427533 ( 1.568226) 600 records 1.582359 0.047563 1.629922 ( 1.798454) 700 records 1.804642 0.045215 1.849857 ( 2.052383) 800 records 2.293646 0.066506 2.360152 ( 2.581256) 900 records 2.277810 0.330342 2.608152 ( 2.853636) 1000 records 2.595494 0.078682 2.674176 ( 2.930324) 2000 records 4.777275 0.133482 4.910757 ( 5.438388) 3000 records 7.124048 0.638050 7.762098 ( 8.528608) 4000 records 8.963478 0.274313 9.237791 ( 10.268583) 5000 records 11.238390 1.122125 12.360515 ( 13.587334) =================================================================== Benchmark results with in-memory hash version lookup with components containing 4000 affected packages =================================================================== user system total real 100 records 0.447054 0.013611 0.460665 ( 0.477721) 200 records 0.774120 0.021416 0.795536 ( 0.823644) 300 records 1.466924 0.030320 1.497244 ( 1.529864) 400 records 1.811677 0.041309 1.852986 ( 1.904842) 500 records 2.006759 0.047037 2.053796 ( 2.098774) 600 records 2.462239 0.053800 2.516039 ( 2.567483) 700 records 2.931670 0.101191 3.032861 ( 3.090854) 800 records 5.577336 0.141814 5.719150 ( 5.808348) 900 records 6.541524 0.164061 6.705585 ( 6.797809) 1000 records 6.872795 0.165147 7.037942 ( 7.127721) 2000 records 21.035475 0.488726 21.524201 ( 21.690226) 3000 records 50.627194 1.559264 52.186458 ( 52.466575) 4000 records 78.586900 3.386539 81.973439 ( 82.373508) 5000 records 122.841316 4.818588 127.659904 (128.212117) =================================================================== </details>
Data characteristics
Raw SQL
https://paste.depesz.com/s/w9u
WITH package_versions_cte ( purl_type, name, version ) AS (
VALUES
( 1, 'moodle/moodle', '3.5.17' ),
( 6, 'url-parse', '1.5.1' ),
...
( 5, 'org.apache.ant/ant', '1.9.0' ),
)
SELECT DISTINCT
package_versions_cte.version,
published_date,
title,
description,
cvss_v2,
cvss_v3,
urls,
identifiers,
pm_affected_packages.purl_type,
pm_affected_packages.package_name,
pm_affected_packages.solution,
pm_affected_packages.affected_range
FROM
pm_affected_packages
JOIN pm_advisories ON pm_advisories.id = pm_affected_packages.pm_advisory_id
JOIN package_versions_cte ON package_versions_cte.purl_type = pm_affected_packages.purl_type AND
package_versions_cte.name = pm_affected_packages.package_name
WHERE
(
pm_affected_packages.purl_type,
pm_affected_packages.package_name
) IN (
( 1, 'moodle/moodle' ),
( 6, 'url-parse' ),
...
( 5, 'org.apache.ant/ant' ),
);
Query plan
699 random records where every record has an affected package
- https://explain.depesz.com/s/QZNj
- https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/21081/commands/68882
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #371055 (closed)