Skip to content

Python license information from deps.dev

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem to solve

The Package Metadata DB fetches licenses of Python packages using the BigQuery API of pypi.org. However, this API is out of sync, and as a result Python packages published recently might get the unknown license on GitLab.

This is a known issue, but it might never be fixed. See https://github.com/pypi/warehouse/issues/16008

Proposal

Get licenses of Python packages from deps.dev. See https://deps.dev/pypi/pytest

Related spike issue: SPIKE: Investigate deps.dev as a data source fo... (#439634 - closed)

Further details

quoting @SysAdminCybel:

We're currently doing a Gitlab Ultimate trial, and we are testing Licence Scanning. We were surprised to see that a lot of well known, standard Python packages have an unknown license. Upon further investigation, it looks (if I understand correctly), that the PyPi data is sourced from the public BigQuery dataset bigquery-public-data.pypi.distribution_metadata. That dataset is apparently broken, as you can see by running the following query:

SELECT name, version, license, classifiers, upload_time 
FROM bigquery-public-data.pypi.distribution_metadata 
where name = "pytest"
ORDER BY upload_time DESC

Result:

image

The most recent pytest release available is 8.1.1, from March 2024. There have been 8 releases of Pytest since. I expect all them to appear as "license unknown" in Gitlab since their metadata are not in that dataset.

Interestingly, this is a known issue on Pypi side. I thought you might want to know.

Thanks!

Edited by 🤖 GitLab Bot 🤖