Increase packages_pypi_metadata.keywords text limit
What does this MR do and why?
We recently started uploading more Pypi metadata fields in !131327 (merged).
Unfortunately, because we added a size limit on the keywords
field, packages with many keywords hit this size limit. When a Pypi package with a large keywords array in pyproject.toml
is uploaded, we return an error response:
Uploading python_benedict-0.33.1-py3-none-any.whl
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB • 00:00 • 133.5 MB/s
INFO Response from http://gdk.test:3000/api/v4/projects/47/packages/pypi:
400 Bad Request
INFO {"message":"400 Bad request - Validation failed: Keywords is too long (maximum is 255 characters)"}
ERROR HTTPError: 400 Bad Request from http://gdk.test:3000/api/v4/projects/47/packages/pypi
Bad Request
This MR does two things:
- Increase the keywords limit to 1024, to accommodate Pypi packages with many keywords
- truncate keywords if it exceeds 1024 characters, so that Pypi packages that hit the limit can still be uploaded, with the truncated keywords in the metadata.
Database migration logs
up
main: == [advisory_lock_connection] object_id: 111600, pg_backend_pid: 84139
main: == 20240219135601 UpdatePypiMetadataKeywodsCheckConstraint: migrating =========
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- execute("ALTER TABLE packages_pypi_metadata\nADD CONSTRAINT check_222e4f5b58\nCHECK ( char_length(keywords) <= 1024 )\nNOT VALID;\n")
main: -> 0.0017s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0003s
main: -- execute("ALTER TABLE packages_pypi_metadata VALIDATE CONSTRAINT check_222e4f5b58;")
main: -> 0.0010s
main: -- execute("RESET statement_timeout")
main: -> 0.0003s
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- execute(" ALTER TABLE packages_pypi_metadata\n DROP CONSTRAINT IF EXISTS check_02be2c39af\n")
main: -> 0.0009s
main: == 20240219135601 UpdatePypiMetadataKeywodsCheckConstraint: migrated (0.0648s)
main: == [advisory_lock_connection] object_id: 111600, pg_backend_pid: 84139
ci: == [advisory_lock_connection] object_id: 111960, pg_backend_pid: 84141
ci: == 20240219135601 UpdatePypiMetadataKeywodsCheckConstraint: migrating =========
ci: -- transaction_open?(nil)
ci: -> 0.0000s
ci: -- transaction_open?(nil)
ci: -> 0.0000s
ci: -- execute("ALTER TABLE packages_pypi_metadata\nADD CONSTRAINT check_222e4f5b58\nCHECK ( char_length(keywords) <= 1024 )\nNOT VALID;\n")
ci: -> 0.0017s
ci: -- execute("SET statement_timeout TO 0")
ci: -> 0.0002s
ci: -- execute("ALTER TABLE packages_pypi_metadata VALIDATE CONSTRAINT check_222e4f5b58;")
ci: -> 0.0004s
ci: -- execute("RESET statement_timeout")
ci: -> 0.0002s
ci: -- transaction_open?(nil)
ci: -> 0.0000s
ci: -- transaction_open?(nil)
ci: -> 0.0000s
ci: -- execute(" ALTER TABLE packages_pypi_metadata\n DROP CONSTRAINT IF EXISTS check_02be2c39af\n")
ci: -> 0.0005s
ci: == 20240219135601 UpdatePypiMetadataKeywodsCheckConstraint: migrated (0.0239s)
ci: == [advisory_lock_connection] object_id: 111960, pg_backend_pid: 84141
main: == [advisory_lock_connection] object_id: 112180, pg_backend_pid: 84144
main: == 20240222000000 RemovePackagesProtectionRulesPackageNamePatternIlikeQueryColumn: migrating
main: -- column_exists?(:packages_protection_rules, :package_name_pattern_ilike_query)
main: -> 0.0037s
main: == 20240222000000 RemovePackagesProtectionRulesPackageNamePatternIlikeQueryColumn: migrated (0.0105s)
main: == [advisory_lock_connection] object_id: 112180, pg_backend_pid: 84144
ci: == [advisory_lock_connection] object_id: 112260, pg_backend_pid: 84146
ci: == 20240222000000 RemovePackagesProtectionRulesPackageNamePatternIlikeQueryColumn: migrating
ci: -- column_exists?(:packages_protection_rules, :package_name_pattern_ilike_query)
ci: -> 0.0044s
ci: == 20240222000000 RemovePackagesProtectionRulesPackageNamePatternIlikeQueryColumn: migrated (0.0205s)
ci: == [advisory_lock_connection] object_id: 112260, pg_backend_pid: 84146
down
main: == [advisory_lock_connection] object_id: 117760, pg_backend_pid: 87014
main: == 20240214204805 MakeFindingIdNotNull: reverting =============================
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- execute(" ALTER TABLE vulnerabilities\n DROP CONSTRAINT IF EXISTS check_4d8a873f1f\n")
main: -> 0.0017s
main: == 20240214204805 MakeFindingIdNotNull: reverted (0.0165s) ====================
main: == [advisory_lock_connection] object_id: 117760, pg_backend_pid: 87014
ci: == [advisory_lock_connection] object_id: 117820, pg_backend_pid: 87399
ci: == 20240214204805 MakeFindingIdNotNull: reverting =============================
ci: -- transaction_open?(nil)
ci: -> 0.0000s
ci: -- transaction_open?(nil)
ci: -> 0.0000s
ci: -- execute(" ALTER TABLE vulnerabilities\n DROP CONSTRAINT IF EXISTS check_4d8a873f1f\n")
ci: -> 0.0013s
ci: == 20240214204805 MakeFindingIdNotNull: reverted (0.0201s) ====================
ci: == [advisory_lock_connection] object_id: 117820, pg_backend_pid: 87399
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
No UI changes
How to set up and validate locally
A. Verify that a package with keywords longer than 255 characters can be uploaded
- Install the prerequisites
- Clone this python package
- Build the package. From the package directory, run
python3 -m build
- Setup authentication following our guide
- Upload the package. From the package directory, run
python3 -m twine upload --verbose --repository gitlab dist/*
. NOTE: The upload will fail if you try to upload a version that already exists. Delete the existing version in Package Registry, or build and upload a different version.
Expected response when running the MR branch:
Uploading python_benedict-0.33.1-py3-none-any.whl
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB • 00:00 • 134.5 MB/s
INFO Response from http://gdk.test:3000/api/v4/projects/47/packages/pypi:
201 Created
INFO {"message":"201 Created"}
Uploading python-benedict-0.33.1.tar.gz
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.8/99.8 kB • 00:00 • 138.6 MB/s
INFO Response from http://gdk.test:3000/api/v4/projects/47/packages/pypi:
201 Created
INFO {"message":"201 Created"}
- Verify that the package metadata was populated. From a Rails console, run
::Packages::Package.last.pypi_metadatum.keywords
. The metadatum record should have been created, with thekeywords
field set to the contents of the keywords key inpyproject.toml
In the master
branch, the upload will fail:
Uploading python_benedict-0.33.1-py3-none-any.whl
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB • 00:00 • 133.5 MB/s
INFO Response from http://gdk.test:3000/api/v4/projects/47/packages/pypi:
400 Bad Request
INFO {"message":"400 Bad request - Validation failed: Keywords is too long (maximum is 255 characters)"}
ERROR HTTPError: 400 Bad Request from http://gdk.test:3000/api/v4/projects/47/packages/pypi
Bad Request
B. Verify that a package with keywords longer than 1024 characters can be uploaded
- Modify the
pyproject.toml
file from the previous section. Add keywords until the keywords array, when converted to a string, is longer than 1024 characters. Duplicate keywords are allowed, so you can simply paste and copy lines 12-63 of pyproject.toml repeatedly into thekeywords
array.
- Modify the value of
__version__
inbenedict/metadata.py
- Cleanup the previous build arfitacts:
rm dist/*
- Rebuild the package:
python3 -m build
- Upload the new package version:
python3 -m twine upload --verbose --repository gitlab dist/*
- Verify that the keywords field was populated and truncated. From a Rails console, run
::Packages::Package.last.pypi_metadatum.keywords.length
.
[3] pry(main)> ::Packages::Package.last.pypi_metadatum.keywords.length
=> 1024
Related to #440402 (closed)
Edited by Radamanthus Batnag