Skip to content

Generate Case Table and Find Case Mapping for any Character

Copied From !2

The table is a binary file that takes the form

<32-bit key> <32-bit lower> <32-bit upper> <32-bit title>

where the case hex value is 0 if there is no mapping. The binary file has a "row" for each number between 1 and 1114109, which is the value of the largest codepoint in UnicodeData.txt. (UnicodeData.txt, however, only has 31618 lines.) This mapping of absent lines was done to enable constant-time indexing into the binary mapping file.

The binary mapping file has a size of ~17.8 MB.

I have not yet been able to get the official tests to work, but I made a few tests that are generated in generate-case-table.lua in the function test_case_mapping(). Here are the results:

-----------------------------------------------------------------
                       Mapping Tests
-----------------------------------------------------------------
test: à
	title = À
	upper = À

test: A
	lower = a

test: a
	title = A
	upper = A

test: ȱ
	title = Ȱ
	upper = Ȱ

test: ☃

test: 蓝

test: ힰ

test: A
	lower = a

test: a
	title = A
	upper = A

Merge request reports