This MR contains the following updates:
Package | Change | Age | Adoption | Passing | Confidence |
---|---|---|---|---|---|
charset-normalizer |
==2.1.1 -> ==3.1.0
|
Release Notes
Ousret/charset_normalizer
v3.1.0
Added
- Argument
should_rename_legacy
for legacy functiondetect
and disregard any new arguments without errors (MR #262)
Removed
- Support for Python 3.6 (MR #260)
Changed
- Optional speedup provided by mypy/c 1.0.1
v3.0.1
Fixed
- Multi-bytes cutter/chunk generator did not always cut correctly (MR #233)
Changed
- Speedup provided by mypy/c 0.990 on Python >= 3.7
v3.0.0
Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter
language_threshold
infrom_bytes
,from_path
andfrom_fp
to adjust the minimum expected coherence ratio -
normalizer --version
now specify if current version provide extra speedup (meaning mypyc compilation whl)
Changed
- Build with static metadata using 'build' frontend
- Make the language detection stricter
- Optional: Module
md.py
can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
Fixed
- CLI with opt --normalize fail when using full path for files
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
- Sphinx warnings when generating the documentation
Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
- Breaking: Method
first()
andbest()
from CharsetMatch - UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- Breaking: Top-level function
normalize
- Breaking: Properties
chaos_secondary_pass
,coherence_non_latin
andw_counter
from CharsetMatch - Support for the backport
unicodedata2
Configuration
-
If you want to rebase/retry this MR, check this box
This MR has been generated by Renovate Bot.