Spike: Efficient storage of redundant licenses for SBOM component versions

Time-box: 2 days

Topic to Evaluate

In #372212 (comment 1091206383) we identified that licenses tend to be the same across all versions of a SBOM component, and that we should leverage these redundancies to keep the Postgres DB as lean as possible. We identified two ways to remove these redundancies:

Track license data of SBOM components along with ranges of versions sharing the same licenses.
Track license data of SBOM component, and also track license data of SBOM component versions. The former is used as a default value, and the latter is only set for versions that differ from that default value.

Before moving forward with Update DB schema to store data imported from th... (#373163 - closed) and updating the DB schema, we need to identify the best option.

We'll compare the options using the following criteria:

size of the DB tables used to track licenses of SBOM components
feasibility of importing the External License Database
feasibility of License Scanning
speed/complexity when performing License Scanning

Tasks to Evaluate

Risks and Implementation Considerations

/cc @brytannia

Auto-Summary 🤖

Discoto Usage

Points

Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive) point:. For example, the following are all valid points:

#### POINT: This is a point

* point: This is a point

+ Point: This is a point

- pOINT: This is a point

point: This is a **point**

Note that any markdown used in the point text will also be propagated into the topic summaries.

Topics

Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.

Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive) topic:. For example, the following are all valid topics:

# Topic: Inline discussion topic 1

## TOPIC: **{+A Green, bolded topic+}**

### tOpIc: Another topic

Quick Actions

Action Description

/discuss sub-topic TITLE Create an issue for a sub-topic. Does not work in epics

/discuss link ISSUABLE-LINK Link an issuable as a child of this discussion

Action	Description
`/discuss sub-topic TITLE`	Create an issue for a sub-topic. Does not work in epics
`/discuss link ISSUABLE-LINK`	Link an issuable as a child of this discussion

Last updated by this job

TOPIC Storing licenses for versions where licenses change #374901 (comment 1110980868)
- efficient storage #374901 (comment 1111924326)
- only needs comparison of versions #374901 (comment 1111928145)
- simple to evaluate #374901 (comment 1111939674)
- Size of DB table #374901 (comment 1111983450)
- Import #374901 (comment 1117940020)
- License Scanning #374901 (comment 1117957521)
- Total size of DB tables #374901 (comment 1118718916)
TOPIC Storing licenses of components with version range #374901 (comment 1111873025)
- need for a version range syntax #374901 (comment 1111929635)
- complex upserts when the licenses change #374901 (comment 1111932712)
TOPIC Storing licenses of components, and licenses of outliner versions #374901 (comment 1111895166)
- Too many records when multiple license sets #374901 (comment 1111918356)
- Too many inserts to track new versions with non-default licenses #374901 (comment 1111918356)
- Default not suitable to versions not yet listed #374901 (comment 1111918356)
TOPIC Accuracy when storing licenses of versions introducing a change #374901 (comment 1115937001)
TOPIC Composite primary keys #374901 (comment 1121926442)
TOPIC No compression for project-specific license data #374901 (comment 1122314183)

Discoto Settings

---
summary:
  max_items: -1
  sort_by: created
  sort_direction: ascending

See the settings schema for details.

Edited Oct 11, 2022 by Fabien Catteau