Spike: Assess golang package differences between depsdev and license-db
Topic to evaluate
The original research spike noted that there were 4 times as many package-versions in the license-db
as deps.dev
. This needs to be researched in order to be able to get deps.dev
as a substitute for the current data source.
Proposal
Investigate the difference between the 2 data source and explain the disparity.
1 possible method is to spot check well known packages. Wherever a significant disparity is found, go through actual versions of a package at its source and check versions in the data source. Use this to find if there's a systematic error or categorization difference between the 2 data sources.
Tasks to Evaluate
Following the proposal above:
-
Generate data -
Using latest snapshot group package and its version counts in the PackageVersions table for System=Go
. -
Do same as above using license-db.go_license
table. -
Export both as json.
-
-
Compare by joining both of the above sources on package name. -
Analyze -
Find packages with significant disparities. -
For each package, compare versions (commits, tags) at the source against the data source.
-
-
Assess whether deps.dev
is missing versions or whetherlicense-db
is over-classifying packages.
Timebox
2d
Conclusion
As of the date of closing of this issue, deps.dev
is not usable as a source of golang
licenses since pseudoversions are consistently missing from many go modules.