Implement Dual Data Source Support for NPM Feeder with deps.dev Integration
Problem
Our NPM Feeder currently lack the flexibility to dynamically choose between different data sources, limiting our ability to use comprehensive and up-to-date package information. To address the need for better data handling and improved accuracy, we propose integrating deps.dev
as a selectable data source.
Solution
The proposed solution is to enhance the NPM Feeder by allowing it to switch between the existing repository data and deps.dev
based on a new --source
configuration flag.
Implementation Plan
Feeder changes
-
Add a flag --source
which can take two possible values:registry
ordeps.dev
. We can use this optional flag to specify which source of data we want. Default value can beregistry
. Then we can update the feeders with a new function namedWithSource
. Every feeder can implement a logic like if there is an unknown source then use the default. If the source is not supported use the default etc. We should make sure that Cargo implementsdeps.dev
by default. -
Add new npm.DepsDevFeeder
-
Dispatch npm feeder based on source value -
Create a query for the delta of the last two snapshots and verify the results -
Using the cargo feeder as a guide, implement deps.dev
source innpm.DepsDevFeeder
. -
We can extract some common code between npm and cargo feeder for deps.dev like latestSnapshotQuery , snapshot struct etc -
When retrieving data for NPM make sure that you leave out expressions that contain dependencies versions. For more info look at #458137 (comment 1887570307) -
Extend Package struct with a source field. -
Release a new version -
Test E2E
Deployment changes
Scripts Changes
-
Update run_feeder.sh to check for LICENSES_SOURCE
env var. If the var is not set then we use--licenses-source=registry
otherwise--licenses-source=$LICENSES_SOURCE
. -
update .gitlab-ci.yml
to use the latest version of the feeder
Documentation changes
-
We should document the new source of data in the readme file. -
We should update feeder-testing section with the new flag. -
Update section on the deployment docs on how to run feeder with deps.dev as a source. -
Add a table comparing what sources are supported currently ( not sure if we really need this if our intention is to totally drop registry.repo as a source).
Outline
Here is a guide indicating the essential code changes that act as an entry point:
sample code
commit 3e5421a6526c8b67f46d109be35ee1cce46eba15
Author: Philip Cunningham <pcunningham@gitlab.com>
Date: Tue Apr 30 14:54:38 2024 +0100
Add scaffolding for NPM deps.dev
diff --git a/cmd/license-feeder/main.go b/cmd/license-feeder/main.go
index 35d39ce..afdad16 100644
--- a/cmd/license-feeder/main.go
+++ b/cmd/license-feeder/main.go
@@ -33,6 +33,12 @@ import (
var pubsubClient = &pubsub.Client{}
var (
+ feederDataSource = &cli.StringFlag{
+ Name: "data-source",
+ Usage: "Specify the data source for the feeder: 'repo' or 'deps.dev'.",
+ EnvVars: []string{"DATA_SOURCE"},
+ }
+
feederSendTopic = &cli.StringFlag{
Name: "send-topic",
Usage: "The name of the topic to send messages out to",
@@ -126,6 +132,7 @@ func main() {
var err error
ctx := context.Background()
+ dataSource := cCtx.String("data-source")
topicID := cCtx.String("send-topic")
projectID := cCtx.String("project")
registry := cCtx.String("registry")
@@ -174,10 +181,11 @@ func main() {
state.WithIgnoreValue(ignoreCursorStart),
)
- return runFeed(ctx, cCtx, logger, registry, cursor, pub, projectID)
+ return runFeed(ctx, cCtx, logger, registry, dataSource, cursor, pub, projectID)
},
Flags: []cli.Flag{
feederRegistry,
+ feederDataSource,
feederSendTopic,
feederProject,
feederEnv,
@@ -198,7 +206,7 @@ func main() {
}
}
-func runFeed(ctx context.Context, cCtx *cli.Context, logger *zerolog.Logger, registry string, cursor state.Cursor, pub publisher.Publisher, projectID string) error {
+func runFeed(ctx context.Context, cCtx *cli.Context, logger *zerolog.Logger, registry string, dataSource string, cursor state.Cursor, pub publisher.Publisher, projectID string) error {
var feed feeders.LicenseFeeder
switch registry {
@@ -237,13 +245,18 @@ func runFeed(ctx context.Context, cCtx *cli.Context, logger *zerolog.Logger, reg
packagist.WithLogger(logger),
)
case npm.RegistryName:
- feed = npm.New(
- npm.WithLogger(logger),
- npm.WithPublisher(pub),
- npm.WithCursor(cursor),
- npm.WithNpmRegistryURL(cCtx.String("npm-registry-url")),
- npm.WithNpmRegistryAuth(cCtx.String("npm-registry-auth")),
- )
+ switch dataSource {
+ case "deps.dev":
+ feed = npm.NewDepsDev()
+ default:
+ feed = npm.New(
+ npm.WithLogger(logger),
+ npm.WithPublisher(pub),
+ npm.WithCursor(cursor),
+ npm.WithNpmRegistryURL(cCtx.String("npm-registry-url")),
+ npm.WithNpmRegistryAuth(cCtx.String("npm-registry-auth")),
+ )
+ }
case golang.RegistryName:
feed = golang.New(
golang.WithLogger(logger),
diff --git a/feeders/license/npm/depsdevfeeder.go b/feeders/license/npm/depsdevfeeder.go
new file mode 100644
index 0000000..327160d
--- /dev/null
+++ b/feeders/license/npm/depsdevfeeder.go
@@ -0,0 +1,24 @@
+package npm
+
+import (
+ "context"
+)
+
+// DepsDevFeeder is a struct that implements the LicenseFeeder interface
+type DepsDevFeeder struct{}
+
+// New creates a new NPM feeder struct
+func NewDepsDev(opts ...func(*DepsDevFeeder)) *DepsDevFeeder {
+ return &DepsDevFeeder{}
+}
+
+// RegistryName returns the name of the registry for JavaScript packages
+func (f *DepsDevFeeder) RegistryName() string {
+ return RegistryName
+}
+
+// Feed the list of JavaScript packages to the interfacer
+func (f *DepsDevFeeder) Feed(ctx context.Context) error {
+ // TODO
+ return nil
+}
Edited by Nick Ilieskou