Capture metadata about license classification

Problem

we don't capture additional information when we classify a license, which makes it difficult to determine our level of confidence in individual classifications and difficult to compare the effectiveness of alternative classification strategies.

Proposed Solution

capture metadata about a license classification in license-db and in gcp bucket.

Store license classification data in JSON, in the interfacer directory of the internal GCP bucket.

Implementation

  1. Introduce new ClassificationResult struct:
type ClassificationResult struct {
	Matches             []string `json:"matches"`
	CacheHit            bool     `json:"cache_hit"`
	LicenseURL          string   `json:"license_url,omitempty"`
	RewrittenLicenseURL string   `json:"rewritten_license_url,omitempty"`
	Response            []byte   `json:"license_url_response,omitempty"`
	TransformedResponse []byte   `json:"license_url_transformed_response,omitempty"`
}
  1. Extend Classify return a ClassificationResult

  2. Add Metadata field to data#Version with type []interface{}

  3. Append each ClassificationResult to Version#Metadata

  4. Within dispatcher#Incoming serialize each Version#Metadata to json and write to internal gcp bucket with path {package_registry}/{package_name}/{package_version}/metadata.json

Edited by Fabien Catteau