LLVM - Pointer values of procedural variable constants cast from integers are truncated

Summary

When constructing procedural variable constants from integer values (with the help of {$modeswitch pointertoprocvar}), the LLVM backend truncates the stored pointer value to some common word size (i.e. 8, 16, 32 and 64 bit words) depending on the number of leading (i.e. high) set bits.

Quite a mouthful, isn't it? You may ask yourself: Why is this relevant? Why would you need to cast integers into function pointers?

Well, the reason I stumbled upon this bug in the first place is the following definition from SQLite3:

typedef void (*sqlite3_destructor_type)(void*);
#define SQLITE_TRANSIENT ((sqlite3_destructor_type) -1)

In my own SQLite3 header translation, I expressed this in Pascal like so:

{$modeswitch pointertoprocvar}
type
  psqlite3_destructor = procedure (block: pcvoid); SQLITE_API;
const
  SQLITE_TRANSIENT: psqlite3_destructor_type = Pointer(-1);

(I want SQLITE_TRANSIENT to be usable without requiring that both a cast to psqlite3_destructor and {$modeswitch pointertoprocvar} be present at the site of use, meaning that the constant has to be typed.)

Running an LLVM-compiled application that makes use of SQLite3 (and hence this constant) crashes with an access violation at address 0x00000000000000FF because - as I had to find out - the value of SQLITE_TRANSIENT stored in the binary isn't 0xFFFFFFFFFFFFFFFF, but instead 0x00000000000000FF, leading to SQLite3 attempting to use it as a genuine function pointer.

Changing Pointer(-1) to Pointer(High(PtrUInt)), Pointer(not PtrUInt(0)) or similiar tricks all have no effect.

After some testing, I found out that procvar constants get truncated according to the following strange pattern, where:

  • . is a nibble with any bit pattern (i.e. 0bxxxx)
  • + is a nibble that has its high bit set (i.e. 0b1xxx)
  • + is a nibble that has its high bit unset (i.e. 0b0xxx)
0xFFFFFFFFFFFFFF+. -> 0x00000000000000+.
0xFFFFFFFFFFFFFF-. -> 0x000000000000FF-.
0xFFFFFFFFFFFF+... -> 0x000000000000+...
0xFFFFFFFFFFFF-... -> 0x00000000FFFF-...
0xFFFFFFFF+....... -> 0x00000000+.......
0xFFFFFFFF-....... -> 0xFFFFFFFF-.......

Some example values:

  • 0xFFFFFFFFFFFFFFFF -> 0x00000000000000FF
  • 0xFFFFFFFFFFFFFF80 -> 0x0000000000000080
  • 0xFFFFFFFFFFFFFF7F -> 0x000000000000FF7F
  • 0xFFFFFFFFFFFFAAAA -> 0x000000000000AAAA
  • 0xFFFFFFFFFFFF2AAA -> 0x00000000FFFF2AAA

System Information

  • Operating system: Linux (Debian 12.10.0)
  • Processor architecture: x86_64
  • Compiler version: Trunk @ d3975bd65e6b4865abeaa0ce3448bb2df7b67ba8 and LLVM 16

Steps to reproduce

Compile & run the following program (using an LLVM-enabled version of FPC):

Example Project

program llvm_procvar_const;

{$C+}
{$modeswitch pointertoprocvar}

const
  INTEGER_VALUE = $FFFFFFFFFFFFFFFF;
  POINTER_VALUE: TProcedure = Pointer(INTEGER_VALUE);

var
  Ptr, Int: String;

begin
  Ptr := HexStr(POINTER_VALUE);
  Int := HexStr(PtrUInt(INTEGER_VALUE), SizeOf(Pointer) * 2);
  WriteLn('Pointer(', Ptr, '), PtrUInt(', Int, ')');
  Assert(Ptr = Int);
end.