Add Package Metadata Compaction During Ingestion in GitLab Rails
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Proposal
In database design, log compaction is a technique that has been used to minimize WAL space usage. The design of our package metadata export is very similar to a database's WAL, and as a result, it can benefit greatly from the log compaction strategy. This technique, if implemented, leads to a much faster sync, and a reduced egress from the buckets storing the export files. Some possibilities:
- Download all export files locally, and recursively run the log compaction algorithm. No maintenance of checkpoints.
- Periodically create checkpoints that can serve as a starting point. The backend ingestion can check if no checkpoints exist, and then select the latest checkpoint.
Edited by 🤖 GitLab Bot 🤖