Spike: Estimate and optimize storage of components imported from License DB

Time-box: 2 days

Topic to Evaluate

In the following thread we've raised concerns about the amount of data stored in the primary Postgres DB when importing from the external License DB: &8492 (comment 1056458042). We need to address these concerns before [PROMOTED] Sync Rails backend with License DB (#373032 - closed), and possibly use a strategy to limit the data size.

This other thread gives inputs to estimate by how much the primary DB would grow if we were to import everything: #372212 (comment 1084617961)

Finally, this comment suggests strategies to skip components and versions when importing from the License DB, to optimize storage: #374901 (comment 1118718916)

Tasks to Evaluate

Compare the following import strategies:

  • Import all components w/ license data and significant versions.
  • Import all components and versions w/ and license data.
  • Restrict import to package type that have been selected instance wide.
  • Import license data for components that already exist in the DB. These have been inserted when ingesting project SBOMs or vulnerability advisories.

To estimate the size of license data, we consider the compression selected in Spike: Efficient storage of redundant licenses ... (#374901 - closed).

Aspects to be considered:

  • Estimated data size
  • Complexity of the import
  • How this benefit to features, including vulnerability scanning
  • How this is detrimental to features, including vulnerability scanning

Finally,

Auto-Summary 🤖

Discoto Usage

Points

Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive) point:. For example, the following are all valid points:

  • #### POINT: This is a point
  • * point: This is a point
  • + Point: This is a point
  • - pOINT: This is a point
  • point: This is a **point**

Note that any markdown used in the point text will also be propagated into the topic summaries.

Topics

Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.

Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive) topic:. For example, the following are all valid topics:

  • # Topic: Inline discussion topic 1
  • ## TOPIC: **{+A Green, bolded topic+}**
  • ### tOpIc: Another topic

Quick Actions

Action Description
/discuss sub-topic TITLE Create an issue for a sub-topic. Does not work in epics
/discuss link ISSUABLE-LINK Link an issuable as a child of this discussion

Last updated by this job

Discoto Settings
---
summary:
  max_items: -1
  sort_by: created
  sort_direction: ascending

See the settings schema for details.

Edited by Lucas Charles