Reduce sbom_occurrences table growth
Why are we doing this work
Currently, we are essentially using sbom_occurrences
as an append-only log and this doesn't scale.
When enabling the cyclonedx_sbom_ingestion
feature flag, we saw approximately 1 million sbom_occurrences
created per hour. This is because we're storing all the occurrences created in each pipeline. We should abandon the append-only log approach.
Instead, we should add columns tracking when the dependency appeared. For example:
column | type | description |
---|---|---|
latest_default_branch_pipeline_id | bigint | Latest pipeline on the default branch where this dependency appeared |
present_on_default_branch | bool | True if this dependency is currently present on the default branch |
This allows us to fullfil the current project requirements, which is to provide the same functionality as the existing dependency list. The dependencies which appear in the list will be the ones that have present_on_default_branch = true
and the latest_default_branch_pipeline_id
can be used to show the last successful scan. In the future, it we would like to show dependencies on other branches, we will need to come up with a different solution for that.
Relevant links
Non-functional requirements
-
Documentation: -
Feature flag: -
Performance: -
Normalising will increase the amount of joins we have to do. Test the potential impact.
-
-
Testing:
Implementation plan
- Write database migrations to:
- Delete all existing rows from
sbom_occurrences
(DB lab says this would take around 7 seconds) - Remove
index_sbom_occurrences_on_ingestion_attributes
- Create a new unique index on
%i[project_id component_id component_version_id source_id]
- Delete all existing rows from
- Update the ingestion tasks:
- Update the ingestion pipeline so that
Sbom::Ingestion::IngestReportsService
receives a list of all thesbom_occurrence
ids (ingested_ids
) that were present in the ingested reports. - Pass the
ingested_ids
to a new serviceSbom::Ingestion::PostIngestionService
- Update the ingestion pipeline so that
- Have the
PostIngestionService
iterate through all the occurrences in the given project. - Delete the records which aren't present in
ingested_ids
in batches of 100. - Update the
pipeline_id
andcommit_sha
of the already-existing records to that of the current pipeline
Verification steps
🤖
Auto-Summary Discoto Usage
Points
Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive)
point:
. For example, the following are all valid points:
#### POINT: This is a point
* point: This is a point
+ Point: This is a point
- pOINT: This is a point
point: This is a **point**
Note that any markdown used in the point text will also be propagated into the topic summaries.
Topics
Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.
Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive)
topic:
. For example, the following are all valid topics:
# Topic: Inline discussion topic 1
## TOPIC: **{+A Green, bolded topic+}**
### tOpIc: Another topic
Quick Actions
Action Description /discuss sub-topic TITLE
Create an issue for a sub-topic. Does not work in epics /discuss link ISSUABLE-LINK
Link an issuable as a child of this discussion
Last updated by this job
- TOPIC Ignored columns #373781 (comment 1198216147)
- TOPIC Partitioning #373781 (comment 1198227876)
- TOPIC Delete #373781 (comment 1198240485)
- TOPIC Indexes #373781 (comment 1198252592)
Discoto Settings
---
summary:
max_items: -1
sort_by: created
sort_direction: ascending
See the settings schema for details.