Incorrect memory layout of objects and broken copy-by-assignment

Summary

The pointer to the virtual method table appears to be placed by the compiler after the object's fields instead of before them. This directly contradicts the documentation, but what's worse is that it gets overwritten if a value of the child object type is assigned to an instance of the base one.

System Information

  • Operating system: Windows 7 Ultimate x64
  • Processor architecture: x86-64
  • Compiler version: 3.2.2 (from Lazarus 3.6)
  • Device: Computer

Example Project

program object_vmt;

//{$MODE objfpc}

// see also the discussion here: Problem with objects`vmt structure
// https://gitlab.com/freepascal.org/fpc/source/-/issues/34239
uses SysUtils{, objects};

type
  TStructBase = object
    FieldBase: PtrUInt;
    constructor Init();
    procedure method(); virtual;
  end;

  TStructChild = object(TStructBase)
    FieldChild: PtrUInt;
    procedure method(); virtual;
  end;

constructor TStructBase.Init();
begin
  WriteLn('init: ', IntToHex(PtrUInt(Pointer(@Self))));
end;

procedure TStructBase.method();
begin
  WriteLn('base method: ', FieldBase,
    ' ', Pointer(@FieldBase)-Pointer(@Self),
    ' ', PtrUInt(Addr(TStructBase(nil^).FieldBase))  // just another way, must be equal to previous
  );
end;

procedure TStructChild.method();
begin
  inherited;
  WriteLn('child method: ', FieldBase, ' ', FieldChild,
    ' ', Pointer(@FieldBase)-Pointer(@Self),
    ' ', PtrUInt(Addr(TStructBase(nil^).FieldBase)),  // just another way, must be equal to previous
    ' ', Pointer(@FieldChild)-Pointer(@Self),
    ' ', PtrUInt(Addr(TStructChild(nil^).FieldChild))  // ditto
  );
end;

procedure polymorphical_indirect_invocation(constref kk: TStructBase);
begin
  kk.method();
end;

procedure value_copy_invocation(jj: TStructBase);
begin
  jj.method();
end;

var
  base: TStructBase;
  child: TStructChild;
begin
  WriteLn(IntToHex(PtrUInt(Addr(base))));
  base.Init();  // inits VMT
  base.FieldBase := 1337;
  base.method();  // works
  polymorphical_indirect_invocation(base);  // works
  value_copy_invocation(base);  // works
  WriteLn(PPtrUInt(@base)^, LineEnding);  // BUG 1: not a VMT value gets printed.

  WriteLn(IntToHex(PtrUInt(Addr(child))));
  child.Init();  // inits VMT
  child.FieldBase := 2024;
  child.FieldChild := 228;
  child.method();  // works
  polymorphical_indirect_invocation(child);  // works
  value_copy_invocation(child);  // BUG 2: overwrites VMT of the argument value.
  base := child;  // BUG 2: overwrites VMT of the 'base' instance,
  base.method();  //        so a descendant method assuming 'child' fields gets invoked.
  WriteLn(PPtrUInt(@child)^, LineEnding);  // BUG 1: not a VMT value gets printed.

  WriteLn(SizeOf(PtrUInt));
  ReadLn();
end.

What is the current bug behavior?

Compiling with -Pi386 (only for shorter pointer values; x86_64 behaves the same way), I get this:

0041C010
init: 0041C010
base method: 1337 0 0
base method: 1337 0 0
base method: 1337 0 0
1337

0041C020
init: 0041C020
base method: 2024 0 0
child method: 2024 228 0 0 8 8
base method: 2024 0 0
child method: 2024 228 0 0 8 8
base method: 2024 0 0
child method: 2024 4309024 0 0 8 8
base method: 2024 0 0
child method: 2024 0 0 0 8 8
2024

4

Note that 4309024 is 41C020 in base-16, which matches the address of the child instance in memory. This means that instead of the value data, the old contents of the stack following the object were read, which was left there from the preceding WriteLn call.

What is the expected behavior?

0041C010
init: 0041C010
base method: 1337 4 4
base method: 1337 4 4
base method: 1337 4 4
<VMT pointer of TStructBase in base-10>

0041C020
init: 0041C020
base method: 2024 4 4
child method: 2024 228 4 4 8 8
base method: 2024 4 4
child method: 2024 228 4 4 8 8
base method: 2024 4 4
base method: 2024 4 4
<VMT pointer of TStructChild in base-10>

4

Possible fixes

Copy-by-assignment LV := RV should set the VMT pointer value of LV object according to its declared type. This will introduce initialization-by-assignment semantics, alternative to constructor invocation, so it should be documented as such.

Another approach would be to do a plain copy, as happens with records, without touching the pointer to VMT. But then we will have to prohibit passing virtual objects by value into routines, because otherwise they would obviously remain uninitialized (compiler shall not call an object constructor implicitly by itself).

It would be nice to keep the pointer to the VMT not at the beginning of the object, but just before it. This would make the layout of data in objects more predictable and allow them to be treated like regular records, including raw memcpy-like copying. The latter requires proper qualification of the order of fields when there's visibility specifiers, though.

Edited by Dmitry D. Chernov