Seed package metadata tables to demo License Scanning SBOM Scanner
Problem to solve
We need to seed the package metadata tables (pm_*
tables) of the Rails backend in order to demo the new License Scanning implementation connected to License DB.
It should be possible to run the demo on a local environment (GDK) where the License Scanning SBOM scanner is enabled. So the script that seeds the DB doesn't have to be production-ready, and it's not part of the product.
We only need to seed the DB with enough data to demonstrates License Scanning in a given project. There's no need to seed the DB with all the data exported by License DB.
Proposal
- Download CSV files exported by License DB. Save them in a directory accessible to the Postgres server.
- Import these CSV files to dedicated tables using
COPY
. - Upsert packages, versions, licenses, and relationships b/w versions and licenses using
INSERT INTO ... ON CONFLICT
. - Check imported data by running a
SELECT
query that joins all the aforementioned tables, and present the result like in the CSV files.
Further details
See https://www.postgresql.org/docs/15/sql-copy.html
It is recommended that the file name used in COPY always be specified as an absolute path. This is enforced by the server in the case of COPY TO, but for COPY FROM you do have the option of reading from a file specified by a relative path. The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.
FREEZE
Requests copying the data with rows already frozen, just as they would be after running the VACUUM FREEZE command. This is intended as a performance option for initial data loading. Rows will be frozen only if the table being loaded has been created or truncated in the current subtransaction, there are no cursors open and there are no older snapshots held by this transaction. It is currently not possible to perform a COPY FREEZE on a partitioned table.
FORMAT
Selects the data format to be read or written: text, csv (Comma Separated Values), or binary. The default is text.
See the license-db deployment for the location of the bucket.
Implementation plan
-
Provide a script and/or instruction to seed the package metadata table. #389417 (comment 1254219749)