UnicodeData Alternative Any% WR
This makes UnicodeData
unit much smaller, sometimes faster (more often than not), and adds SimpleTitleCase
support that maps digraphs like “dz” (this is one character, try selecting it) to their correct title forms “Dz” instead of “DZ”. See #39577 for a rough idea about numbers and what’s going on. On real strings NormalizeNFD
becomes more like 30% to 50% faster than now, rather than just equal.
The program ucd_pack.pas generates unicodedata_props.inc
from UnicodeData.txt
. It is supposed to be put into utils/unicode
, but I don’t have a clue about what makefiles are. :’-(
Simpler version: ucd_pack_c.pas, that attempts to be clever less and relies on the standard library more.
I kept TUC_Prop
and GetProps
to not break too much, marking them with deprecated
. They are emulated by passing a codepoint value as @self
which is a dirty but very lightweight way; the unit was already using low-level tricks from the same category like dynamically allocated structures of variable length.