Skip to content

Update licenses schema limits

Igor Frenkel requested to merge 408901-update-json-schema into master

What does this MR do and why?

Update PackageMetadata::Package.licenses json schema to account for outliers after last ingestion test: #409732 (comment 1384402080).

How to reproduce the errors

Checkout license validation script repo: https://gitlab.com/ifrenkel/license-schema-validation

To create the list of all validation errors: bundle exec ruby check_schema.rb (you can supply the schema url to this script - e.g. master vs this branch).

To summarize errors: bundle exec ruby analyze_errors.rb

This is a screenshot with more detail on the validation errors (generated by the scripts above):

     purl type       |       err type       |    num times seen    |       max val        |   avg memsize (kb)   |  max mem size (kb)   | total mem size (kb)  |  location in schema 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         go          |       maxItems       |         5852         |        23724         |        164.24        |        197.09        |        328.48        | /definitions/versions
         go          |       maxItems       |         363          |         116          |         0.63         |         0.95         |         3.76         | /definitions/license_ids
         go          |       maxItems       |          11          |          20          |         0.16         |         0.20         |         0.66         | /definitions/non_default_licenses
       maven         |       maxItems       |         3908         |         2469         |         4.79         |        25.98         |        38.36         | /definitions/versions
        npm          |       maxItems       |         5183         |        11440         |        19.16         |        131.40        |        191.59        | /definitions/versions
        npm          |         null         |          1           |         256          |         0.41         |         0.41         |         0.41         | /definitions/lowest_version/oneOf/0
        npm          |         null         |          1           |         256          |         0.41         |         0.41         |         0.41         | /definitions/highest_version/oneOf/0
        npm          |      maxLength       |          2           |         256          |         0.41         |         0.41         |         0.41         | /definitions/version
       nuget         |       maxItems       |         2800         |         2647         |        13.35         |        25.98         |        80.07         | /definitions/versions
     packagist       |       maxItems       |         1234         |         1199         |         4.65         |        11.56         |        41.84         | /definitions/versions
     packagist       |       maxItems       |          1           |          13          |         0.14         |         0.14         |         0.14         | /definitions/license_ids
        pypi         |       maxItems       |         816          |         1902         |         7.38         |        17.33         |        51.66         | /definitions/versions
      rubygem        |       maxItems       |         490          |         1125         |         5.19         |        11.56         |        46.71         | /definitions/versions
      rubygem        |       maxItems       |          11          |          12          |         0.13         |         0.13         |         0.13         | /definitions/license_ids

Histograms

In order to visualize the distribution of errors (mostly maxItems) grouped by error type and schema location, run: bundle exec ruby histogram.rb

error type: maxItems (schema location: /definitions/license_ids)
0..50: ******* (342)
50..100: * (31)
100..150: * (2)

error type: maxItems (schema location: /definitions/non_default_licenses)
0..50: * (11)

error type: maxItems (schema location: /definitions/versions)
50..100: ************************************************************************************************************************************************************************************************************************************* (11429)
100..150: ************************************************************************************* (4200)
150..200: *********************************** (1703)
200..250: *************** (730)
250..300: ************ (553)
300..350: ******** (354)
350..400: ***** (205)
400..450: **** (182)
450..500: *** (117)
500..550: *** (135)
550..600: ** (67)
600..650: * (44)
650..700: ** (55)
700..750: * (49)
750..800: * (17)
800..850: * (37)
850..900: * (32)
900..950: * (24)
950..1000: * (34)
1000..1050: * (21)
1050..1100: * (26)
1100..1150: * (15)
1150..1200: * (23)
1200..1250: * (10)
1250..1300: * (19)
1300..1350: * (7)
1350..1400: * (6)
1400..1450: * (10)
1450..1500: * (7)
1500..1550: * (13)
1550..1600: * (6)
1600..1650: * (6)
1650..1700: * (9)
1700..1750: * (4)
1750..1800: * (6)
1800..1850: * (4)
1850..1900: * (1)
1900..1950: * (5)
1950..2000: * (7)
2000..: *** (111)

error type: maxLength (schema location: /definitions/version)
250..300: * (2)

error type: null (schema location: /definitions/highest_version/oneOf/0)
250..300: * (1)

error type: null (schema location: /definitions/lowest_version/oneOf/0)
250..300: * (1)

Percentiles

For percentiles by error type: bundle exec ruby percentile.rb


error type: maxItems (schema location: /definitions/license_ids)
percentile 0.5, value: 15.0
percentile 0.75, value: 31.0
percentile 0.95, value: 89.0
percentile 0.99, value: 89.0
percentile 0.999, value: 115.62599999999998

error type: maxItems (schema location: /definitions/non_default_licenses)
percentile 0.5, value: 13.0
percentile 0.75, value: 18.0
percentile 0.95, value: 19.5
percentile 0.99, value: 19.9
percentile 0.999, value: 19.990000000000002

error type: maxItems (schema location: /definitions/versions)
percentile 0.5, value: 90.0
percentile 0.75, value: 143.0
percentile 0.95, value: 430.8999999999978
percentile 0.99, value: 1297.1800000000003
percentile 0.999, value: 5603.826000000081

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #408901 (closed)

Edited by Igor Frenkel

Merge request reports