Sort dependencies by license
What does this MR do and why?
This change is gated by the group_level_licenses
feature flag. This change adds an option to sort dependencies, on the group dependency list, by SPDX identifier. The sbom_occurrences.licenses
column is a jsonb
column with a schema that resembles the following:
[
{
"url": "https://spdx.org/licenses/Apache-2.0.html",
"name": "Apache 2.0",
"spdx_identifier": "Apache-2.0"
},
{
"url": "https://spdx.org/licenses/Artistic-2.0.html",
"name": "Artistic-2.0",
"spdx_identifier": "Artistic-2.0"
},
{
"url": "https://spdx.org/licenses/BSD-3-Clause.html",
"name": "New BSD",
"spdx_identifier": "BSD-3-Clause"
},
{
"url": "https://spdx.org/licenses/GPL-3.0.html",
"name": "GPLv3",
"spdx_identifier": "GPL-3.0"
}
]
The change in this MR attempts to add support for sorting Sbom::Occurence
records by licenses. The # of elements in the sbom_occurrences.licenses
column can be between 0-N with no upper bound at this time.
The change in this MR attempts to sort records by the first detected license then by the second detected license.
Related to:
- Add `&sort_by=license` query string parameter t... (#422086 - closed) • mo khan • 16.4 • At risk
- Add sort-by-license to group-level dependencies... (!130280 - merged) • David Pisek • 16.4
Sort by licenses ascending:
SELECT
sbom_occurrences.*,
agg_occurrences.occurrence_count,
agg_occurrences.project_count
FROM
sbom_occurrences
INNER JOIN (
SELECT
component_id,
COUNT(DISTINCT id) AS occurrence_count,
COUNT(DISTINCT project_id) AS project_count
FROM
sbom_occurrences
WHERE
project_id IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @> ('{9970}'))))
GROUP BY
component_id) agg_occurrences ON sbom_occurrences.component_id = agg_occurrences.component_id
WHERE
"sbom_occurrences"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @> ('{9970}'))))
ORDER BY
licenses #> '{0,spdx_identifier}' ASC,
licenses #> '{1,spdx_identifier}' ASC
LIMIT 20 OFFSET 0
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/21933/commands/71064
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Merge request reports
Activity
changed milestone to %16.4
added backend featureenhancement groupthreat insights labels
assigned to @mokhax
added typefeature label
added devopsgovern sectionsec labels
- A deleted user
added database databasereview pending labels
10 Warnings 3dcde825: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. c5db5119: The commit subject must contain at least 3 words. For more information, take a look at our Commit message guidelines. b60d4695: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. a38b3e5b: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. 3ac589d1: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. b17fed88: The commit subject must start with a capital letter. For more information, take a look at our Commit message guidelines. a1c6e235: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. b2a541ec: The commit subject must contain at least 3 words. For more information, take a look at our Commit message guidelines. This merge request has more than 20 commits which may cause issues in some of the jobs. If you see errors like missing commits, please consider squashing some commits so it is within 20 commits. featureaddition and featureenhancement merge requests normally have a documentation change. Consider adding a documentation update or confirming the documentation plan with the Technical Writer counterpart.
For more information, see:
- The Handbook page on merge request types.
- The definition of done documentation.
1 Message CHANGELOG missing: If this merge request needs a changelog entry, add the
Changelog
trailer to the commit message you want to add to the changelog.If this merge request doesn't need a CHANGELOG entry, feel free to ignore this message.
Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Javiera Tapia (
@jtapiab
) (UTC-3, 3 hours ahead of@mokhax
)Piotr Skorupa (
@pskorupa
) (UTC+2, 8 hours ahead of@mokhax
)database Max Woolf (
@mwoolf
) (UTC+1, 7 hours ahead of@mokhax
)Prabakaran Murugesan (
@praba.m7n
) (UTC+2, 8 hours ahead of@mokhax
)frontend Ammar Alakkad (
@aalakkad
) (UTC+3, 9 hours ahead of@mokhax
)Lukas Eipert (
@leipert
) (UTC+2, 8 hours ahead of@mokhax
)Please check reviewer's status!
Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Danger@mokhax Some end-to-end (E2E) tests should run based on the stage label.
Please start the
trigger-omnibus-and-follow-up-e2e
job in theqa
stage and ensure tests in thefollow-up-e2e:package-and-test-ee
pipeline pass before this MR is merged. (E2E tests are computationally intensive and don't run automatically for every push/rebase, so we ask you to run this job manually at least once.)To run all E2E tests, apply the pipeline:run-all-e2e label and run a new pipeline.
E2E test jobs are allowed to fail due to flakiness. See current failures at the latest pipeline triage issue.
Once done, apply the
emoji on this comment.Team members only: for any questions or help, reach out on the internal
#quality
Slack channel.mentioned in epic &10090
@ghavenga, @minac, @Quintasan can you offer any ideas on how I can optimize the query in the description to sort by
sbom_occurrences.licenses#>spdx_identifier
?Below is some example data:
localhost/gitlabhq_production= # select distinct(licenses) from sbom_occurrences where project_id in (13083, 278964, 47209117, 7764) and licenses != '[]' order by licenses asc; -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------> [{"url": "https://spdx.org/licenses/0BSD.html", "name": "BSD Zero Clause License", "spdx_identifier": "0BSD"}] [{"url": "https://spdx.org/licenses/Apache-2.0.html", "name": "Apache 2.0", "spdx_identifier": "Apache-2.0"}] [{"url": "https://spdx.org/licenses/BSD-2-Clause.html", "name": "Simplified BSD", "spdx_identifier": "BSD-2-Clause"}] [{"url": "https://spdx.org/licenses/BSD-3-Clause.html", "name": "New BSD", "spdx_identifier": "BSD-3-Clause"}] [{"url": "https://spdx.org/licenses/BlueOak-1.0.0.html", "name": "Blue Oak Model License 1.0.0", "spdx_identifier": "BlueOak-1.0.0"}] [{"url": "https://spdx.org/licenses/CC-BY-3.0.html", "name": "Creative Commons Attribution 3.0 Unported", "spdx_identifier": "CC-BY-3.0"}] [{"url": "https://spdx.org/licenses/CC-BY-4.0.html", "name": "Creative Commons Attribution 4.0 International", "spdx_identifier": "CC-BY-4.0"}] [{"url": "https://spdx.org/licenses/CC-PDDC.html", "name": "Creative Commons Public Domain Dedication and Certification", "spdx_identifier": "CC-PDDC"}] [{"url": "https://spdx.org/licenses/CC0-1.0.html", "name": "CC0 1.0 Universal", "spdx_identifier": "CC0-1.0"}] [{"url": "https://spdx.org/licenses/EPL-2.0.html", "name": "EPL-2.0", "spdx_identifier": "EPL-2.0"}] [{"url": "https://spdx.org/licenses/GPL-2.0.html", "name": "GPLv2", "spdx_identifier": "GPL-2.0"}] [{"url": "https://spdx.org/licenses/GPL-3.0+.html", "name": "GPL-3.0+", "spdx_identifier": "GPL-3.0+"}] [{"url": "https://spdx.org/licenses/ISC.html", "name": "ISC", "spdx_identifier": "ISC"}] [{"url": "https://spdx.org/licenses/LGPL-2.1+.html", "name": "GNU Library General Public License v2.1 or later", "spdx_identifier": "LGPL-2.1+"}] [{"url": "https://spdx.org/licenses/LGPL-3.0-only.html", "name": "LGPL", "spdx_identifier": "LGPL-3.0-only"}] [{"url": "https://spdx.org/licenses/LGPL-3.0.html", "name": "LGPL-3.0", "spdx_identifier": "LGPL-3.0"}] [{"url": "https://spdx.org/licenses/MIT.html", "name": "MIT", "spdx_identifier": "MIT"}] [{"url": "https://spdx.org/licenses/MPL-2.0.html", "name": "Mozilla Public License 2.0", "spdx_identifier": "MPL-2.0"}] [{"url": "https://spdx.org/licenses/MulanPSL-2.0.html", "name": "Mulan Permissive Software License, Version 2", "spdx_identifier": "MulanPSL-2.0"}] [{"url": "https://spdx.org/licenses/OFL-1.1.html", "name": "SIL Open Font License 1.1", "spdx_identifier": "OFL-1.1"}] [{"url": "https://spdx.org/licenses/Python-2.0.html", "name": "Python Software Foundation License", "spdx_identifier": "Python-2.0"}] [{"url": "https://spdx.org/licenses/Ruby.html", "name": "ruby", "spdx_identifier": "Ruby"}] [{"url": "https://spdx.org/licenses/Unlicense.html", "name": "The Unlicense", "spdx_identifier": "Unlicense"}] [{"url": "https://spdx.org/licenses/WTFPL.html", "name": "WTFPL", "spdx_identifier": "WTFPL"}] [{"url": "https://spdx.org/licenses/0BSD.html", "name": "BSD Zero Clause License", "spdx_identifier": "0BSD"}, {"url": "https://spdx.org/licenses/AFL-2.1.html", "name": "Academic Free License v2.1", "spdx_identifier": "AFL-2.1"}] [{"url": "https://spdx.org/licenses/Apache-2.0.html", "name": "Apache 2.0", "spdx_identifier": "Apache-2.0"}, {"url": "https://spdx.org/licenses/MIT.html", "name": "MIT", "spdx_identifier": "MIT"}] [{"url": "https://spdx.org/licenses/BSD-2-Clause.html", "name": "Simplified BSD", "spdx_identifier": "BSD-2-Clause"}, {"url": "https://spdx.org/licenses/GPL-3.0-only.html", "name": "GNU General Public License v3.0 only", "spdx_identifier": "GPL-3.0-only"}] [{"url": "https://spdx.org/licenses/BSD-2-Clause.html", "name": "Simplified BSD", "spdx_identifier": "BSD-2-Clause"}, {"url": "https://spdx.org/licenses/MIT.html", "name": "MIT", "spdx_identifier": "MIT"}] [{"url": "https://spdx.org/licenses/BSD-2-Clause.html", "name": "Simplified BSD", "spdx_identifier": "BSD-2-Clause"}, {"url": "https://spdx.org/licenses/Ruby.html", "name": "ruby", "spdx_identifier": "Ruby"}] [{"url": "https://spdx.org/licenses/GPL-2.0.html", "name": "GPLv2", "spdx_identifier": "GPL-2.0"}, {"url": "https://spdx.org/licenses/Ruby.html", "name": "ruby", "spdx_identifier": "Ruby"}] [{"url": "https://spdx.org/licenses/LGPL-3.0+.html", "name": "LGPL-3.0+", "spdx_identifier": "LGPL-3.0+"}, {"url": "https://spdx.org/licenses/Ruby.html", "name": "ruby", "spdx_identifier": "Ruby"}] [{"url": "https://spdx.org/licenses/MIT.html", "name": "MIT", "spdx_identifier": "MIT"}, {"url": "https://spdx.org/licenses/Ruby.html", "name": "ruby", "spdx_identifier": "Ruby"}] [{"url": "https://spdx.org/licenses/MPL-2.0.html", "name": "Mozilla Public License 2.0", "spdx_identifier": "MPL-2.0"}, {"url": "https://spdx.org/licenses/MPL-2.0-no-copyleft-exception.html", "name": "Mozilla Public License 2.0 (no copyleft exception)", "spdx_identifier": "MPL-2.0-no-copyleft-exception"}] [{"url": "https://spdx.org/licenses/Artistic-2.0.html", "name": "Artistic-2.0", "spdx_identifier": "Artistic-2.0"}, {"url": "https://spdx.org/licenses/GPL-2.0+.html", "name": "GPL-2.0+", "spdx_identifier": "GPL-2.0+"}, {"url": "https://spdx.org/licenses/MIT.html", "name": "MIT", "spdx_identifier": "MIT"}] [{"url": "https://spdx.org/licenses/BSD-2-Clause.html", "name": "Simplified BSD", "spdx_identifier": "BSD-2-Clause"}, {"url": "https://spdx.org/licenses/BSD-3-Clause.html", "name": "New BSD", "spdx_identifier": "BSD-3-Clause"}, {"url": "https://spdx.org/licenses/MPL-2.0.html", "name": "Mozilla Public License 2.0", "spdx_identifier": "MPL-2.0"}] [{"url": "https://spdx.org/licenses/BSD-3-Clause.html", "name": "New BSD", "spdx_identifier": "BSD-3-Clause"}, {"url": "https://spdx.org/licenses/GPL-1.0-only.html", "name": "GNU General Public License v1.0 only", "spdx_identifier": "GPL-1.0-only"}, {"url": "https://spdx.org/licenses/PSF-2.0.html", "name": "Python Software Foundation License 2.0", "spdx_identifier": "PSF-2.0"}] [{"url": "https://spdx.org/licenses/Apache-2.0.html", "name": "Apache 2.0", "spdx_identifier": "Apache-2.0"}, {"url": "https://spdx.org/licenses/Artistic-2.0.html", "name": "Artistic-2.0", "spdx_identifier": "Artistic-2.0"}, {"url": "https://spdx.org/licenses/BSD-3-Clause.html", "name": "New BSD", "spdx_identifier": "BSD-3-Clause"}, {"url": "https://spdx.org/licenses/GPL-3.0.html", "name": "GPLv3", "spdx_identifier": "GPL-3.0"}] (37 rows) Time: 155.255 ms
[ { "url": "https://spdx.org/licenses/Apache-2.0.html", "name": "Apache 2.0", "spdx_identifier": "Apache-2.0" }, { "url": "https://spdx.org/licenses/Artistic-2.0.html", "name": "Artistic-2.0", "spdx_identifier": "Artistic-2.0" }, { "url": "https://spdx.org/licenses/BSD-3-Clause.html", "name": "New BSD", "spdx_identifier": "BSD-3-Clause" }, { "url": "https://spdx.org/licenses/GPL-3.0.html", "name": "GPLv3", "spdx_identifier": "GPL-3.0" } ]
I have tried a few different types of indexes such as:
CREATE INDEX ON sbom_occurrences USING GIN ((licenses#>'{0,spdx_identifier}'), (licenses#>'{1,spdx_identifier}')); CREATE INDEX ON sbom_occurrences USING GIN (licenses); CREATE INDEX ON sbom_occurrences USING GIN (licenses jsonb_path_ops); CREATE INDEX ON sbom_occurrences USING GIN (project_id, (licenses#>'{0,spdx_identifier}'), (licenses#>'{1,spdx_identifier}')); CREATE INDEX ON sbom_occurrences USING GIN (project_id, licenses); CREATE INDEX ON sbom_occurrences USING GIN (project_id, (licenses jsonb_path_ops)); CREATE INDEX ON sbom_occurrences USING GIN (project_id, (licenses#>'{0,spdx_identifier}'), (licenses#>'{1,spdx_identifier}')) WHERE (licenses != '[]'); CREATE INDEX ON sbom_occurrences USING GIN (project_id, licenses) WHERE (licenses != '[]'); CREATE INDEX ON sbom_occurrences USING GIN ((project_id), (licenses) jsonb_path_ops) WHERE (licenses != '[]'); CREATE INDEX ON sbom_occurrences USING BTREE ((licenses#>'{0,spdx_identifier}'), (licenses#>'{1,spdx_identifier}')); CREATE INDEX ON sbom_occurrences USING BTREE (licenses); CREATE INDEX ON sbom_occurrences USING BTREE (project_id, (licenses#>'{0,spdx_identifier}'), (licenses#>'{1,spdx_identifier}')); CREATE INDEX ON sbom_occurrences USING BTREE (project_id, licenses); CREATE INDEX ON sbom_occurrences USING BTREE (project_id, (licenses#>'{0,spdx_identifier}'), (licenses#>'{1,spdx_identifier}')) WHERE (licenses != '[]'); CREATE INDEX ON sbom_occurrences USING BTREE (project_id, licenses) WHERE (licenses != '[]');
... but I haven't been able to convince the query planner to utilize any of these indexes.
The explain queries that I'm starting with are:
EXPLAIN SELECT * FROM sbom_occurrences WHERE project_id IN (13083, 278964, 47209117, 7764) ORDER BY licenses#>'{0,spdx_identifier}' ASC; EXPLAIN SELECT * FROM sbom_occurrences ORDER BY licenses#>'{0,spdx_identifier}' ASC; EXPLAIN SELECT * FROM sbom_occurrences ORDER BY licenses#>'{0,spdx_identifier}' ASC, licenses#>'{1,spdx_identifier}' ASC;
Edited by mo khan- Resolved by mo khan
- Resolved by mo khan
added 1 commit
- 9dd16f76 - rename query string parameter from spdx_identifier to license
- Resolved by Brian Williams
- Resolved by Brian Williams
- Resolved by Brian Williams
- Resolved by Brian Williams
mentioned in issue #422352 (closed)
added databasereviewed label and removed databasereview pending label
- Resolved by mo khan
@zmartins
, thanks for approving this merge request.This is the first time the merge request has been approved. To ensure full test coverage, a new pipeline will be started shortly.
For more info, please refer to the following links:
added pipeline:mr-approved label
requested review from @bwill
removed review request for @zmartins
removed review request for @bwill
requested review from @bwill
- Resolved by Brian Williams
- A deleted user
added Data WarehouseImpact Check label
Database migrations (on the main database)
Migrations included in this change have been executed on gitlab.com data for testing purposes. For details, please see the migration testing pipeline (limited access).
Migration Type Total runtime Result DB size change 20230907204731 - AddIndexToSbomOccurrencesLicenses Post deploy 174.6 s +480.96 MiB Runtime Histogram for all migrations
Query Runtime Count 0 seconds - 0.01 seconds 0 0.01 seconds - 0.1 seconds 2 0.1 seconds - 1 second 0 1 second - 5 seconds 0 5 seconds - 15 seconds 0 15 seconds - 5 minutes 1 5 minutes + 0 Migration: 20230907204731 - AddIndexToSbomOccurrencesLicenses
- Type: Post deploy
- Duration: 174.6 s
- Database size change: +480.96 MiB
Histogram for AddIndexToSbomOccurrencesLicenses
Query Runtime Count 0 seconds - 0.01 seconds 0 0.01 seconds - 0.1 seconds 2 0.1 seconds - 1 second 0 1 second - 5 seconds 0 5 seconds - 15 seconds 0 15 seconds - 5 minutes 1 5 minutes + 0 Background Migration: CreateComplianceStandardsAdherence
Sampled 3 batches. Estimated Time to complete: 3 weeks, 2 days, 4 hours, 56 minutes, and 16 seconds
- Interval: 120s
- Total tuple count: 32108384
- Max batch size: 0
- Estimated seconds to complete: 2004976s
- Estimated number of batches: 3210
- Average batch time: 624.6s
- Batch size: 10000
- N. of batches sampled: 3
- N. of failed batches: 0
Time estimation is conservative and based on sampling production data in a test environment. It represents the max time that migration could take. The actual time may differ from this estimation.
_Consider changing max_batch_size and interval if this estimate is unacceptable._
Histogram of batch runtimes for CreateComplianceStandardsAdherence
Batch Runtime Count 0 seconds - 10 seconds 0 10 seconds - 1 minute 0 1 minute - 2 minutes 0 2 minutes - 3 minutes 0 3 minutes - 5 minutes 1 5 minutes + 2 Histogram across all sampled batches of CreateComplianceStandardsAdherence
Query Runtime Count 0 seconds - 0.1 seconds 45329 0.1 seconds - 0.5 seconds 186 0.5 seconds - 1 second 32 1 second - 2 seconds 25 2 seconds - 5 seconds 4 5 seconds + 0 Other information
Other migrations pending on GitLab.com
Migration Type Total runtime Result DB size change 20230906100001 - AddMetadataColumnsToPackagesPyPiMetadata Regular 7.1 s +0.00 B 20230911095016 - AddRootNamespaceIdToProjectStatistics Regular 100.5 s +212.20 MiB 20230818142801 - QueueCreateComplianceStandardsAdherence Post deploy 2.5 s +0.00 B 20230908072558 - AnalyzePCiRunnerMachineBuilds Post deploy 1.7 s +0.00 B 20230908072612 - AnalyzePCiJobAnnotations Post deploy 1.7 s +0.00 B 20230908072626 - AnalyzePCiBuildsMetadata Post deploy 1.7 s +0.00 B 20230908072639 - AnalyzePCiBuilds Post deploy 1.7 s +0.00 B Clone details
Clone ID Clone Created At Clone Data Timestamp Expected Removal Time database-testing-2312369-11205205-main
2023-09-13T23:06:53Z 2023-09-12T16:23:05Z 2023-09-14 11:46:47 +0000 database-testing-2312369-11205205-ci
2023-09-13T23:06:53Z 2023-09-13T20:46:47Z 2023-09-14 11:46:47 +0000 Database migrations (on the ci database)
Migrations included in this change have been executed on gitlab.com data for testing purposes. For details, please see the migration testing pipeline (limited access).
Migration Type Total runtime Result DB size change 20230907204731 - AddIndexToSbomOccurrencesLicenses Post deploy 4.0 s +8.00 KiB [note] Runtime Histogram for all migrations
Query Runtime Count 0 seconds - 0.01 seconds 0 0.01 seconds - 0.1 seconds 2 0.1 seconds - 1 second 1 1 second - 5 seconds 0 5 seconds - 15 seconds 0 15 seconds - 5 minutes 0 5 minutes + 0 Migration: 20230907204731 - AddIndexToSbomOccurrencesLicenses
- Type: Post deploy
- Duration: 4.0 s
- Database size change: +8.00 KiB [note]
Histogram for AddIndexToSbomOccurrencesLicenses
Query Runtime Count 0 seconds - 0.01 seconds 0 0.01 seconds - 0.1 seconds 2 0.1 seconds - 1 second 1 1 second - 5 seconds 0 5 seconds - 15 seconds 0 15 seconds - 5 minutes 0 5 minutes + 0 Other information
Other migrations pending on GitLab.com
Migration Type Total runtime Result DB size change 20230818142801 - QueueCreateComplianceStandardsAdherence Post deploy 2.7 s +0.00 B Clone details
Clone ID Clone Created At Clone Data Timestamp Expected Removal Time database-testing-2312369-11205205-main
2023-09-13T23:06:53Z 2023-09-12T16:23:05Z 2023-09-14 11:46:47 +0000 database-testing-2312369-11205205-ci
2023-09-13T23:06:53Z 2023-09-13T20:46:47Z 2023-09-14 11:46:47 +0000
Brought to you by gitlab-org/database-team/gitlab-com-database-testing. Epic
- A deleted user
added database-testing-automation label