Update DB schema to store data imported from the License DB
Problem to solve
License data imported from the License Database must be directly available to the Rails backend, so that it can use it to perform License Scanning as soon as CycloneDX SBOMs are ingested.
Proposal
Create a relation table sbom_component_version_licenses
between sbom_component_versions
and software_licenses
to store license data imported from the License DB, and track licenses under which component versions have been published.
Since a component version can't have the same license multiple times, there's a unique index on (sbom_component_versions.id, software_licenses.id)
, and it's used as a primary composite key; this avoids having an id
column, and saves space.
It's assumed that license data imported from the License DB is immutable, and we can avoid the updated_at
column to save space. The created_at
column can be used to know when specific license data was added to the primary DB.
The proposed schema supports composite licenses, but it does not support SPDX licenses expression. In particular, it can't make the distinction between a conjunctive set and a disjunctive set.
The following is the SQL code corresponding to the proposal:
CREATE TABLE sbom_component_version_licenses (
component_version_id bigint REFERENCES sbom_component_version_licenses(id) ON DELETE CASCADE,
software_license_id integer REFERENCES software_licenses(id) ON DELETE CASCADE,
created_at timestamp with time zone NOT NULL,
PRIMARY KEY (component_version_id, software_license_id)
)
(Adding a primary key will automatically create a unique B-tree index on the column or group of columns listed in the primary key, and will force the column(s) to be marked NOT NULL.)
The expected size for a DB record is 8 + 4 + = 16 bytes.
Proposal 2 (accepted)
Create a set of tables representing metadata from public package registries. It'll be imported from the external database service and be read only from the monolith-side components which can then conduct queries against these pm_
tables on fields like package names and versions.
CREATE TABLE pm_packages (
id bigint NOT NULL,
purl_type smallint NOT NULL,
name text NOT NULL,
CONSTRAINT check_packages_name_max_length CHECK ((char_length(name) <= 255))
);
CREATE TABLE pm_package_versions (
id bigint NOT NULL,
package_id bigint NOT NULL REFERENCES pm.packages(id) ON DELETE CASCADE,
version text NOT NULL,
CONSTRAINT check_package_versions_version_max_length CHECK ((char_length(version) <= 255))
);
CREATE TABLE pm_licenses (
id integer NOT NULL,
spdx_identifier character varying(255) NOT NULL
);
CREATE TABLE pm_package_version_licenses (
package_version_id bigint REFERENCES pm.package_versions(id) ON DELETE CASCADE,
license_id integer REFERENCES pm.licenses(id) ON DELETE CASCADE,
created_at timestamp with time zone NOT NULL,
PRIMARY KEY (component_version_id, software_license_id)
)
Implementation plan
Using proposal 2.
-
Create migrations -
Create models and specs for their validation -
Use separate schema alias -
Test partitions with this schema by - Analyze common select queries and index stats with and without partitions and discuss in MR