Improve latency of initial Package Metadata sync for self-managed and dedicated environments
Release notes
Problem to solve
The Package Metadata (External License-DB) sync is a mechanism that allow the rails platform to load the necessary data to support the Continuous Vulnerability Scanning and License Scanning features.
This data sync is relatively fast for advisory data but for license data it might take several hours (up to a couple of days, depending on the system) before the instance has fully loaded the available data.
This results in License Scanning reporting unknown
for license that would normally be detected correctly. This has already generated a couple of customer support requests.
Proposal
There are many opportunities to improve the situation and several suggestions have been made:
- Add warning to the documentation
- Highlight the current situation in troubleshooting guidelines
- Suggest instance admin to disable package type they don't need? (risky to miss them in the future though)
- Show sync progress/status in the admin page.
- With the current implementation it might not be possible to give a percentage of progress but it could be doable to say if there is data avilable but no yet synced.
- With further development, we could consider exposing a counter of available entities in the GCP bucket and the rails app could compare with the local data it has.
- Improve speed of sync
- We currently throtle the sync process to prevent resource contention. Though, this is sub-optimal and there are new approaches available to e.g. throthle based on database health status.
- Partitioning the table per purl_type like we initially planned for could allow to run several job in parallel (per purl_type) and thus reduce the overall time to sync vs the current sequential approach. This is particularly beneficial considering some purl_types like
npm
have a huge amount of packages to sync and might block other types for a while with the current implementation.