Skip to content

Hashing bug related in generic dictionaries on Darwin/x86-64 and CPU i3-8100B

Summary

A simple program that uses a generic dictionary built with trunk compiler on macOS/x86_64 allows duplicate keys.

System Information

  • Operating system: macOS
  • Processor architecture: x86-64 Intel i3-8100B
  • Compiler version: trunk [3.3.1-12931-gfa44cd9d]
  • Device: Computer

Steps to reproduce

  1. Create a program named test:
program test;

{$MODE DELPHI}

uses
  Lang;

begin
  LangObj := TLangClass.Create;
  LangObj.Test('Russian');
end.
  1. Create a second unit named lang:
unit Lang;

{$MODE DELPHI}

interface

uses
  Generics.Collections;

type
  TLangClass = class(TObjectDictionary<string, TObject>)
  public
    procedure Test(const Value: string);
  end;

var
  LangObj: TLangClass;

implementation

{TLanguage}

procedure TLangClass.Test(const Value: string);
const
  SRussian = 'Russian';
var
  Key: string;
  LangDict: TObject;
begin
  writeln('Adding string = ', SRussian);
  WriteLn('Input equals const = ', Value = SRussian);
  Add(SRussian, TObject.Create);
  WriteLn('Added item found = ', TryGetValue(Value, LangDict));
  WriteLn('Now oddly dictionary can hold a duplicate');
  Add(Value, TObject.Create);
  WriteLn('Dictionary keys: ');
  for Key in Keys do
    WriteLn('K = ', Key, ' V = ', Items[Key].GetHashCode);
end;

end.
  1. Compile it with trunk compiler on a x86_64 based macOS.
  2. Run the program and observe the following odd output:
user201578@SY451 x86_64-darwin % ./fpc test.pas && ./test

Free Pascal Compiler version 3.3.1-12931-gfa44cd9da6 [2023/07/10] for x86_64
Copyright (c) 1993-2023 by Florian Klaempfl and others
Target OS: Darwin for x86_64
Compiling test.pas
Assembling (pipe) test.s
Linking test
11 lines compiled, 1.2 sec, 1212416 bytes code, 393215 bytes data

Adding string = Russian
Input equals const = TRUE
Added item found = FALSE
Now oddly dictionary can hold a duplicate
Dictionary keys:
K = Russian V = 4375294400
K = Russian V = 4375294368

user201578@SY451 x86_64-darwin %

Example Project

Trivial example project attached: bug-generics-compiler.zip

What is the current bug behavior?

As you may see, the Test method takes a string value "Russian". Inside the method there is a constant with the same value. The input string and the constant are identical as confirmed by direct string comparison.

Yet when we add the constant to the dictionary and then try finding the input value, nothing is found. Consequentially the dictionary allows two identical string keys to be added.

What is the expected (correct) behavior?

Finding the input value must succeed as the same value was already added. Furthermore, dictionary, by design, must not allow duplicates.

Additional details

  1. The bug only exhibits itself when the dictionary class is in a separate unit and it was introduced in commit e1bbcf70

  2. The string itself has something to do with it. For example, if I replace "Russian" with "ABC" in both units, it works as expected.

  3. Only the x86_64-darwin target seems affected, possibly with the specific CPU. I have further tested with aarch64-darwin, i386-win32, x86_64-win64 and x86_64-linux with newer CPUs and they all work correctly.

  4. It's likely caused by Darwin's ABI issues with the ASM code from the above-mentioned commit, or the CPU doesn't understand some of the ASM instructions from that commit.

Edited by foxpas
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information