Allow tagging repository metadata
For most of the datasources we have within DMD, we base our queries off the platform
, organisation
, and repo
, which match a specific repository in a source forge.
However, across a set of repositories, none of these are equal - some may be the "money maker" for the organisation, some may be example projects or demos, and some may be regular, but important, services.
We should make it possible to provide additional metadata about these repositories to make it possible to perform additional context setting for queries, advisories, and policies (i.e. via #271 (closed)).
This can then feed into into sensitive_packages
(+ also #274 (closed), #303)
An example of what this may look like:
CREATE TABLE IF NOT EXISTS repository_metadata (
platform TEXT NOT NULL,
organisation TEXT NOT NULL,
repo TEXT NOT NULL,
is_monorepo boolean not null,
-- Whether this is a forked repository. This could indicate that this is a temporary repository, a long-standing fork for security + supply-chain hygiene purposes, or some other reason.
is_fork boolean not null,
-- repository_usage is a free-form usage to create enum-style data, for instance `LIBRARY` or `SERVICE`, or `EXAMPLE_CODE`. For additional usage, you can use the `additional_metadata` field
repository_usage text not null,
-- the repository's visibility in the source forge
-- NOTE that this may need more thinking for how it's used, if using i.e. a VPN'd off source forge
visibility TEXT NOT NULL
CHECK (
visibility IN (
"PUBLIC",
"PRIVATE",
"INTERNAL"
)
),
-- text description of the repo for more context, could include links out to other systems i.e. Service Catalog
description TEXT,
-- a JSON object of additional key-value data
additional_metadata TEXT,
UNIQUE (platform, organisation, repo) ON CONFLICT REPLACE
);
This could also include:
-
spec.lifecycle
, via Backstage
Edited by Jamie Tanna