Skip to content

Spike: Efficient storage of redundant licenses for SBOM component versions

Time-box: 2 days

Topic to Evaluate

In #372212 (comment 1091206383) we identified that licenses tend to be the same across all versions of a SBOM component, and that we should leverage these redundancies to keep the Postgres DB as lean as possible. We identified two ways to remove these redundancies:

  • Track license data of SBOM components along with ranges of versions sharing the same licenses.
  • Track license data of SBOM component, and also track license data of SBOM component versions. The former is used as a default value, and the latter is only set for versions that differ from that default value.

Before moving forward with Update DB schema to store data imported from th... (#373163 - closed) and updating the DB schema, we need to identify the best option.

We'll compare the options using the following criteria:

  • size of the DB tables used to track licenses of SBOM components
  • feasibility of importing the External License Database
  • feasibility of License Scanning
  • speed/complexity when performing License Scanning

Tasks to Evaluate

  • Collect data on redundancies.
    • How frequent are SBOM components whose licenses are the same across all versions?
    • Ideally, get a distribution of the number of distinct sets of licenses per SBOM component.
  • Evaluate storing licenses of components w/ version range.
    • Estimate size of DB tables.
    • Check feasibility of license data import.
    • Check feasibility of License Scanning.
    • Estimate relative complexity of License Scanning.
  • Evaluate storing licenses of components (default), and licenses of versions (exceptions).
    • Estimate size of DB tables.
    • Check feasibility of license data import.
    • Check feasibility of License Scanning.
    • Estimate relative complexity of License Scanning.
  • Evaluate storing licenses of component versions, omitting redundancies.
    • Estimate size of DB tables.
    • Check feasibility of license data import.
    • Check feasibility of License Scanning.
    • Estimate relative complexity of License Scanning.
  • Choose one option.
  • Update #373163 (closed) with the option that's been selected.

Risks and Implementation Considerations

/cc @brytannia

Auto-Summary 🤖

Discoto Usage

Points

Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive) point:. For example, the following are all valid points:

  • #### POINT: This is a point
  • * point: This is a point
  • + Point: This is a point
  • - pOINT: This is a point
  • point: This is a **point**

Note that any markdown used in the point text will also be propagated into the topic summaries.

Topics

Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.

Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive) topic:. For example, the following are all valid topics:

  • # Topic: Inline discussion topic 1
  • ## TOPIC: **{+A Green, bolded topic+}**
  • ### tOpIc: Another topic

Quick Actions

Action Description
/discuss sub-topic TITLE Create an issue for a sub-topic. Does not work in epics
/discuss link ISSUABLE-LINK Link an issuable as a child of this discussion

Last updated by this job

Discoto Settings
---
summary:
  max_items: -1
  sort_by: created
  sort_direction: ascending

See the settings schema for details.

Edited by Fabien Catteau