Normalizing dashes and underscores in project names
Running python -m build
on some projects that use dashes in their names had some interesting problems which lead me down a long rabbit hole. Recording my findings here so that they can be fixed someday.
Symptoms
meson.build | pyproject.toml | results |
---|---|---|
jinja-tutu | jinja-tutu | The wheel file and jinja-tutu-0.2.0.dist-info/ dir use dashes in the name. The RECORD file is generated in jinja-tutu.dist-info/RECORD (no version in the directory name) METADATA records name with a dash |
jinja_tutu | jinja_tutu | All files are created with underscores in their names and the RECORD file is generated in the correct directory. METADATA records name with an underscore |
jinja-tutu | jinja_tutu | traceback searching for jinja_tutu-0.2.0.tar.xz https://gitlab.com/-/snippets/2280457
|
jinja_tutu | jinja-tutu | traceback like above but looking for: jinja-tutu-0.2.0.tar.xz
|
jinja-tutu | No name field | Same as when both files use dashes |
jinja_tutu | No name field | Same as when both files use underscores |
Specifications and other tool behaviours
There are four places where the name is output:
- The wheel filename
- Specified here: https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
- dashes in the name portion must be transformed into underscores
- As part of the dist-info directory in the wheel
- The specification just talks about distribution here without specifying that it should be escaped in this instance. However, setuptools does transform dashes to underscore here, just like with the wheel filename
- Inside the METADATA file
- I found several specifications that offer no guidance on normalization:
- And one which specifies that underscores are normalized to dashes:
- https://peps.python.org/pep-0503/#normalized-names
- setuptools
appears to follow this as underscores in distribution names are recorded in METADATA with dashes.appears to use whatever was specified as the project name. Dashes if it has dashes, underscores if it has underscores.
- The sdist filename
- The only thing I found was something with wrong information (it says it isn't standardized [true] but the de facto standard is to normalize to underscores [false].)
- setuptools uses whatever was specified as the project name. If dashes are used, the sdist contains dashes. If underscores were used, the sdist filename contains underscores.
Note: I'm going to refer to the normalization to perform as "to underscores" or "to dashes" below but the actual rule in PEP 503 is re.sub(r"[-_.]+", "-", name).lower()
. So "to dashes" can use that regex verbatim while "to underscores" would use that regex but substitute underscores instead of dashes.
Problem code
I think the cause of the version being left out of the directory in the RECORD file is happening because we're constructing a wheel filename that doesn't normalize dashes to underscores and then passing that to wheel.wheelfile.WheelFile()
here:
That class uses a regex to parse the filename into name and version parts and that regex captures the wrong data when there are dashes in the name portion. Normalizing dashes to underscores in the filename should fix that. It will fix the directory that RECORD is written to and the wheel filename.
The dist-info dir also needs to be normalized to underscores. That would be done here:
The METADATA needs to do the opposite normalization (underscores to dashes). I think that should be done here: https://gitlab.com/thiblahute/mesonpep517/-/blob/aabe01e993b87769860741b25d124f9c3fa0c8f2/mesonpep517/buildapi.py#L272 which will normalize the input prior to writing the METADATA file but will leave the source name intact for other uses (like in sdist).
The sdist problems listed in symptoms only occur when the name in meson.build and pyproject.toml disagree so that's probably not important for this issue but will come into play when solving #16