Skip to content

Use Aho-Corasick-like trie in sysencoding.inc.

Rika requested to merge runewalsh/source:syscodepages into main

Not a serious proposal, but it reduces the Lazarus executable size (to which both of the functions in question are linked) by 7 (i386) ~ 11 (x64) Kb (empty LCL application seems to use only CodePageToCodePageName and is reduced only by under 2 or so), and contains a program to update the lookup tables (more) conveniently and maintain bimap semantics by default. Presently, and supposedly because of the absence of such an automated generator:

  • CodePageHashes contains two pairs of meaningless duplicates where the second will never be returned by CodePageNameToCodePage: 'iso-2022-jp' maps both to cp = 50220 and to cp = 50222, and 'euc-jp' maps both to cp = 20932 and cp = 51932.

  • CodePageNames also contains meaningless duplicates: 'x-Chinese_CNS' vs. 'x-chinese-cns' and 'x_Chinese-Eten' vs. 'x-chinese-eten'. Note in the first case the binary search returns the second among equal items but in the second case it returns the first.

  • CodePageNameToCodePage('iso-8859-11') returns 874 but CodePageToCodePageName(874) returns 'windows-874', while 'iso-8859-11' is returned by CodePageToCodePageName(28601). Wikipedia page about ISO 8859-11 indeed mentions its deep relation to 874 but afais it is not 874 after all and the entire thing looks like a copy-paste artifact.

Edited by Rika

Merge request reports