Python license information from deps.dev
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem to solve
The Package Metadata DB fetches licenses of Python packages using the BigQuery API of pypi.org.
However, this API is out of sync, and as a result Python packages published recently
might get the unknown
license on GitLab.
This is a known issue, but it might never be fixed. See https://github.com/pypi/warehouse/issues/16008
Proposal
Get licenses of Python packages from deps.dev
. See https://deps.dev/pypi/pytest
Related spike issue: SPIKE: Investigate deps.dev as a data source fo... (#439634 - closed)
Further details
quoting @SysAdminCybel
:
We're currently doing a Gitlab Ultimate trial, and we are testing Licence Scanning. We were surprised to see that a lot of well known, standard Python packages have an unknown license. Upon further investigation, it looks (if I understand correctly), that the PyPi data is sourced from the public BigQuery dataset
bigquery-public-data.pypi.distribution_metadata
. That dataset is apparently broken, as you can see by running the following query:SELECT name, version, license, classifiers, upload_time FROM bigquery-public-data.pypi.distribution_metadata where name = "pytest" ORDER BY upload_time DESC
Result:
The most recent
pytest
release available is 8.1.1, from March 2024. There have been 8 releases of Pytest since. I expect all them to appear as "license unknown" in Gitlab since their metadata are not in that dataset.Interestingly, this is a known issue on Pypi side. I thought you might want to know.
Thanks!