Use Aho-Corasick-like trie in sysencoding.inc.
Not a serious proposal, but it reduces the Lazarus executable size (to which both of the functions in question are linked) by 7 (i386
) ~ 11 (x64
) Kb (empty LCL application seems to use only CodePageToCodePageName
and is reduced only by under 2 or so), and contains a program to update the lookup tables (more) conveniently and maintain bimap semantics by default. Presently, and supposedly because of the absence of such an automated generator:
-
CodePageHashes
contains two pairs of meaningless duplicates where the second will never be returned byCodePageNameToCodePage
:'iso-2022-jp'
maps both tocp = 50220
and tocp = 50222
, and'euc-jp'
maps both tocp = 20932
andcp = 51932
. -
CodePageNames
also contains meaningless duplicates:'x-Chinese_CNS'
vs.'x-chinese-cns'
and'x_Chinese-Eten'
vs.'x-chinese-eten'
. Note in the first case the binary search returns the second among equal items but in the second case it returns the first. -
CodePageNameToCodePage('iso-8859-11')
returns 874 butCodePageToCodePageName(874)
returns'windows-874'
, while'iso-8859-11'
is returned byCodePageToCodePageName(28601)
. Wikipedia page about ISO 8859-11 indeed mentions its deep relation to 874 but afais it is not 874 after all and the entire thing looks like a copy-paste artifact.