Find software license in repository when NPM returns an empty license
Release notes
TODO
Problem to solve
The NPM registry lists the software license for a package based on the license declared in the package.json
field.
If the field is empty, then the package is assumed to have no license which may be incorrect. For example, some packages check-in a file at the root of the repository e.g. LICENSE
or LICENSE.md
, and completely ignore the package.json
manifest when it comes to declaring their license terms.
Proposal
Use a hybrid approach where we use two methods (registry and file based) to detect the license used and increase the accuracy + recall of our license scanning for NPM projects. The following edge cases should be handled:
License file present? | License in package.json present? |
Decision |
---|---|---|
Yes | No | Use license in license file. |
No | Yes | Use license in package.json . |
Yes | Yes | Use license in package.json
|
Looking at the golang interfacer implementation, the interfacer looks for files in a directory that are known to contain licenses. It then uses this list to inspect the files if they exist and classify the license(s) included.
A similar approach could be had for NPM packages:
- Check if the registry contains an entry for the license in
package.json
. This can be determined by checking if the response to the request for the package's metadata returns a license. If it does, then use that as the known license. - If the registry does not contain an entry for the license, then check for a license file and use the classifier to determine what the licenses are for the package. Use these licenses as the known licenses for the package.
Risks
The license files are all read into memory and then passed to the classifier will stay in memory until the garbage collector runs. If this results in high memory usage, thrashing can occur and degrade the performance of the npm interfacer. Profiling runs can help determine if the classifier introduces heavy memory consumption. Some other options include configuring the garbage collection and reusing memory buffers to prevent extra allocations.
Feature Usage Metrics
TODO
Implementation Plan
TODO
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.