Skip to content

UnicodeData Alternative Any% WR

Rika requested to merge runewalsh/source:ucd_separate into main

This makes UnicodeData unit much smaller, sometimes faster (more often than not), and adds SimpleTitleCase support that maps digraphs like “dz” (this is one character, try selecting it) to their correct title forms “Dz” instead of “DZ”. See #39577 for a rough idea about numbers and what’s going on. On real strings NormalizeNFD becomes more like 30% to 50% faster than now, rather than just equal.

The program ucd_pack.pas generates unicodedata_props.inc from UnicodeData.txt. It is supposed to be put into utils/unicode, but I don’t have a clue about what makefiles are. :’-(

Simpler version: ucd_pack_c.pas, that attempts to be clever less and relies on the standard library more.

I kept TUC_Prop and GetProps to not break too much, marking them with deprecated. They are emulated by passing a codepoint value as @self which is a dirty but very lightweight way; the unit was already using low-level tricks from the same category like dynamically allocated structures of variable length.

Edited by Rika

Merge request reports

Loading