Improve the parsing of IANA Language Subtag Registry
-
extlang
entries with Prefix
✅ (resolved in commit 10158081 and 75c4b691)
extlang
entries with Prefix
%%
Type: extlang
Subtag: bfi
Description: British Sign Language
Added: 2009-07-29
Preferred-Value: bfi
Prefix: sgn
Now: seems okay (has a name in properties panel), but the components are empty (even the first component, especially ISO 639-3 codes)
-
Prefix
or Macrolanguage
in the legacy language
element
put
✅ (resolved in commit 10158081, 3a0af259 and 75c4b691)
Prefix
or Macrolanguage
in the legacy language
element
put %%
Type: extlang
Subtag: gan
Description: Gan Chinese
Added: 2009-07-29
Preferred-Value: gan
Prefix: zh
Macrolanguage: zh
%%
Type: extlang
Subtag: yue
Description: Yue Chinese
Description: Cantonese
Added: 2009-07-29
Preferred-Value: yue
Prefix: zh
Macrolanguage: zh
Now:
-
--language 0:yue
=> language:und
, IETF:yue
-
--language 0:gan
=> language:und
, IETF:gan
Goal:
-
language:
chi
, IETF:zh-yue
/zh-gan
-
grandfathered
entries considered invalid
grandfathered
entries considered invalid
%%
Type: grandfathered
Tag: en-GB-oed
Description: English, Oxford English Dictionary spelling
Added: 2003-07-09
Deprecated: 2015-04-17
Preferred-Value: en-GB-oxendict
%%
Type: grandfathered
Tag: i-klingon
Description: Klingon
Added: 1999-05-26
Deprecated: 2004-02-24
Preferred-Value: tlh
Type: grandfathered
Tag: zh-min-nan
Description: Minnan, Hokkien, Amoy, Taiwanese, Southern Min, Southern
Fujian, Hoklo, Southern Fukien, Ho-lo
Added: 2001-03-26
Deprecated: 2009-07-29
Preferred-Value: nan
-
redundant
entries with Preferred-Value
codes
redundant
entries with Preferred-Value
codes
%%
Type: redundant
Tag: sgn-JP
Description: Japanese Sign Language
Added: 2001-11-11
Deprecated: 2009-07-29
Preferred-Value: jsl
Be careful with loops :P (shouldn't occur if stick to BCP 47)
%%
Type: redundant
Tag: zh-cmn
Description: Mandarin Chinese
Added: 2005-07-15
Deprecated: 2009-07-29
Preferred-Value: cmn
%%
Type: redundant
Tag: zh-cmn-Hans
Description: Mandarin Chinese (Simplified)
Added: 2005-07-15
Deprecated: 2009-07-29
Preferred-Value: cmn-Hans
for the current MKVToolNix structure, zh-cmn-Hans
is preferred than cmn-Hans
.
-
zh-guoyu
(grandfathered) =>cmn
(canonical form) =>zh-cmn
(MKVToolNix style: primary+extlang) -
zh-min-nan
(grandfathered) =>nan
(canonical form) =>zh-nan
(primary+extlang)
-
variant
prefix too strict
(fixed in commit a73c424e, 4db742b4 and ...)
variant
prefix too strict
%%
Type: variant
Subtag: jyutping
Description: Jyutping Cantonese Romanization
Added: 2010-10-23
Prefix: yue
Comments: Jyutping romanization of Cantonese
-
extlang
; improve notice about multiple extlang
definitions
after canonicalization: allow only one ✅ (resolved in commit faf86c74)
extlang
; improve notice about multiple extlang
definitions
after canonicalization: allow only one 2.2.2. Extended Language Subtags
4. Although the ABNF production 'extlang' permits up to three
extended language tags in the language tag, extended language
subtags MUST NOT include another extended language subtag in
their 'Prefix'. That is, the second and third extended language
subtag positions in a language tag are permanently reserved and
tags that include those subtags in that position are, and will
always remain, invalid.
-
after canonicalization: option in Preferences to choose a default canonicalization style✅ (resolved in commit 75c4b691)
after canonicalization: option in Preferences to choose a default canonicalization style
For example:
- I know what I'm typing
- BCP 47
- BCP 47 + primary/macro language tags widely used in practice (
zh
)
-
qae
-qtz
considered invalid
✅ (fixed in cc6a7b39)
qae
-qtz
considered invalid
("a".."d").each do |letter|
alpha_3 = "qa#{letter}"
People may bump into this when software don't modify Qxx
language tags from DCP packages.
Some widely-used tags in industry practice:
DCNC tag | IETF tag | note |
---|---|---|
QMS |
cmn-Hans |
Chinese, Mandarin, simplified characters, subtitles only (hard- or soft-subbed) |
QMT |
cmn-Hant |
Chinese, Mandarin, traditional characters, subtitles only (hard- or soft-subbed) |
QTM |
cmn-TW |
Chinese, Mandarin, Taiwan accent, audio only |
This kind of Qxx
tags are seen in more than 95% of theatrical prints in China, like:
DCP title | audio | subtitle |
---|---|---|
Uncharted_FTR_S_ EN-QMS _CN_51-Dbox_2K_SPE_20220224_DLX_IOP_OV
|
English |
Hans (soft-subbed) |
Uncharted_FTR_S_ CMN-QMS _CN_51-Dbox_2K_SPE_20220225_DLX_IOP_OV
|
Mandarin (cmn ), Beijing accent |
Hans (soft-subbed) |
TheBatman_FTR-3-Censor_S_ EN-qms _CN_51-Dbox_4K_WR_20220307_DLX_IOP_OV
|
English |
Hans (hard-subbed) |
Simply allowing Qxx
is enough.
If you're interested in normalizing them, filter Q
on this page, or refer to this JSON.