Incorrect memory layout of objects and broken copy-by-assignment
Summary
The pointer to the virtual method table appears to be placed by the compiler after the object's fields instead of before them. This directly contradicts the documentation, but what's worse is that it gets overwritten if a value of the child object type is assigned to an instance of the base one.
System Information
- Operating system: Windows 7 Ultimate x64
- Processor architecture: x86-64
- Compiler version: 3.2.2 (from Lazarus 3.6)
- Device: Computer
Example Project
program object_vmt;
//{$MODE objfpc}
// see also the discussion here: Problem with objects`vmt structure
// https://gitlab.com/freepascal.org/fpc/source/-/issues/34239
uses SysUtils{, objects};
type
TStructBase = object
FieldBase: PtrUInt;
constructor Init();
procedure method(); virtual;
end;
TStructChild = object(TStructBase)
FieldChild: PtrUInt;
procedure method(); virtual;
end;
constructor TStructBase.Init();
begin
WriteLn('init: ', IntToHex(PtrUInt(Pointer(@Self))));
end;
procedure TStructBase.method();
begin
WriteLn('base method: ', FieldBase,
' ', Pointer(@FieldBase)-Pointer(@Self),
' ', PtrUInt(Addr(TStructBase(nil^).FieldBase)) // just another way, must be equal to previous
);
end;
procedure TStructChild.method();
begin
inherited;
WriteLn('child method: ', FieldBase, ' ', FieldChild,
' ', Pointer(@FieldBase)-Pointer(@Self),
' ', PtrUInt(Addr(TStructBase(nil^).FieldBase)), // just another way, must be equal to previous
' ', Pointer(@FieldChild)-Pointer(@Self),
' ', PtrUInt(Addr(TStructChild(nil^).FieldChild)) // ditto
);
end;
procedure polymorphical_indirect_invocation(constref kk: TStructBase);
begin
kk.method();
end;
procedure value_copy_invocation(jj: TStructBase);
begin
jj.method();
end;
var
base: TStructBase;
child: TStructChild;
begin
WriteLn(IntToHex(PtrUInt(Addr(base))));
base.Init(); // inits VMT
base.FieldBase := 1337;
base.method(); // works
polymorphical_indirect_invocation(base); // works
value_copy_invocation(base); // works
WriteLn(PPtrUInt(@base)^, LineEnding); // BUG 1: not a VMT value gets printed.
WriteLn(IntToHex(PtrUInt(Addr(child))));
child.Init(); // inits VMT
child.FieldBase := 2024;
child.FieldChild := 228;
child.method(); // works
polymorphical_indirect_invocation(child); // works
value_copy_invocation(child); // BUG 2: overwrites VMT of the argument value.
base := child; // BUG 2: overwrites VMT of the 'base' instance,
base.method(); // so a descendant method assuming 'child' fields gets invoked.
WriteLn(PPtrUInt(@child)^, LineEnding); // BUG 1: not a VMT value gets printed.
WriteLn(SizeOf(PtrUInt));
ReadLn();
end.
What is the current bug behavior?
Compiling with -Pi386 (only for shorter pointer values; x86_64 behaves the same way), I get this:
0041C010
init: 0041C010
base method: 1337 0 0
base method: 1337 0 0
base method: 1337 0 0
1337
0041C020
init: 0041C020
base method: 2024 0 0
child method: 2024 228 0 0 8 8
base method: 2024 0 0
child method: 2024 228 0 0 8 8
base method: 2024 0 0
child method: 2024 4309024 0 0 8 8
base method: 2024 0 0
child method: 2024 0 0 0 8 8
2024
4
Note that 4309024 is 41C020 in base-16, which matches the address of the child instance in memory. This means that instead of the value data, the old contents of the stack following the object were read, which was left there from the preceding WriteLn call.
What is the expected behavior?
0041C010
init: 0041C010
base method: 1337 4 4
base method: 1337 4 4
base method: 1337 4 4
<VMT pointer of TStructBase in base-10>
0041C020
init: 0041C020
base method: 2024 4 4
child method: 2024 228 4 4 8 8
base method: 2024 4 4
child method: 2024 228 4 4 8 8
base method: 2024 4 4
base method: 2024 4 4
<VMT pointer of TStructChild in base-10>
4
Possible fixes
Copy-by-assignment LV := RV should set the VMT pointer value of LV object according to its declared type. This will introduce initialization-by-assignment semantics, alternative to constructor invocation, so it should be documented as such.
Another approach would be to do a plain copy, as happens with records, without touching the pointer to VMT. But then we will have to prohibit passing virtual objects by value into routines, because otherwise they would obviously remain uninitialized (compiler shall not call an object constructor implicitly by itself).
It would be nice to keep the pointer to the VMT not at the beginning of the object, but just before it. This would make the layout of data in objects more predictable and allow them to be treated like regular records, including raw memcpy-like copying. The latter requires proper qualification of the order of fields when there's visibility specifiers, though.