Spike: How to reduce package metadata tables footprint on the rails instance database
Background
Before we can safely enable the sync of package metadata between the external license DB and the GitLab self-managed rails instances, we need to reduce the total DB footprint of the corresponding table (data + indexes).
Currently we have:
- Tables only: ~12GB
- Indexes only: ~14GB
- Total : ~26 GB
Proposal
🤖
Auto-Summary Discoto Usage
Points
Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive)
point:
. For example, the following are all valid points:
#### POINT: This is a point
* point: This is a point
+ Point: This is a point
- pOINT: This is a point
point: This is a **point**
Note that any markdown used in the point text will also be propagated into the topic summaries.
Topics
Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.
Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive)
topic:
. For example, the following are all valid topics:
# Topic: Inline discussion topic 1
## TOPIC: **{+A Green, bolded topic+}**
### tOpIc: Another topic
Quick Actions
Action Description /discuss sub-topic TITLE
Create an issue for a sub-topic. Does not work in epics /discuss link ISSUABLE-LINK
Link an issuable as a child of this discussion
Last updated by this job
-
TOPIC Remove unknown licenses #407454 (comment 1353720354)
- This is a cheap change to make in terms of eng time. #407454 (comment 1353720354)
- Can be done concurrently with other solutions. #407454 (comment 1353720354)
-
TOPIC "copy-on-write" duplicates #407454 (comment 1353868760)
- Relatively cheap change in terms of eng time. #407454 (comment 1353868760)
- Does not touch significantly schema allowing faster iteration. #407454 (comment 1353868760)
- Missing or unknown records automatically get default licenses. #407454 (comment 1353868760)
- If a default license comes in the wrong order (e.g. MIT for first version, Apache for the next 100) the db will revert to having lots of duplicates. #407454 (comment 1353868760)
- TOPIC Store a version range for unique license set #407454 (comment 1353876905)
-
TOPIC Don't store package metadata in the database #407454 (comment 1357167221)
- New query infrastructure. Probably new to the monolith itself. #407454 (comment 1357167221)
- Has edge cases which have performance downsides. #407454 (comment 1357167221)
- Good performance for a large amount of use cases. #407454 (comment 1357167221)
- Dependencies only rarely change their licenses. #407454 (comment 1357167221)
- On-disk dataset is quite small. #407454 (comment 1357167221)
- Dataset growth is quite stable. #407454 (comment 1357167221)
- Data used in memory is proportional to the data in the instance. #407454 (comment 1357167221)
- Non-GitLab SaaS in-memory data limited by purl_type. #407454 (comment 1357167221)
- Takes advantage of the 80/20 rule. #407454 (comment 1357167221)
-
TOPIC api-based approach #407454 (comment 1357360794)
- Removes all storage headaches from the monolith side. #407454 (comment 1357360794)
- Fairly easy to implement on the monolith side. #407454 (comment 1357360794)
- Medium complexity implementing on the external db side. #407454 (comment 1357360794)
- Large complexity building an authorization layer for instances. Because need to check whether instance is authorized to use this external component. #407454 (comment 1357360794)
- Not a solution for offline instances. #407454 (comment 1357360794)
- TOPIC accuracy of result for newer versions #407454 (comment 1358907453)
- TOPIC Fallback behaviour #407454 (comment 1373548188)
Discoto Settings
---
summary:
max_items: -1
sort_by: created
sort_direction: ascending
See the settings schema for details.