Generate Case Table and Find Case Mapping for any Character
The table is a binary file that takes the form
<32-bit key> <32-bit lower> <32-bit upper> <32-bit title>
where the case hex value is 0 if there is no mapping. The binary file has a "row" for each number between 1 and 1114109, which is the value of the largest codepoint in UnicodeData.txt. (UnicodeData.txt, however, only has 31618 lines.) This mapping of absent lines was done to enable constant-time indexing into the binary mapping file.
The binary mapping file has a size of ~17.8 MB.
I have not yet been able to get the official tests to work, but I made a few tests that are generated in generate-case-table.lua
in the function test_case_mapping()
. Here are the results:
-----------------------------------------------------------------
Mapping Tests
-----------------------------------------------------------------
test: à
title = À
upper = À
test: A
lower = a
test: a
title = A
upper = A
test: ȱ
title = Ȱ
upper = Ȱ
test: ☃
test: 蓝
test: ힰ
test: A
lower = a
test: a
title = A
upper = A