Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    • Switch to GitLab Next
  • Sign in / Register
  • FPC Source FPC Source
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 1,275
    • Issues 1,275
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 59
    • Merge requests 59
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • FPC
  • FPC
  • FPC SourceFPC Source
  • Issues
  • #39275
Closed
Open
Created Jul 20, 2021 by FPC Admin account@fpc_adminOwner

frac function is slow on AMD in Linux (but fast on Intel or in Windows)

Original Reporter info from Mantis: Artlav
  • Reporter name: Artyom

Description:

frac function is about 20 times slower on Linux on AMD CPUs (tested on Ryzen 2600, Ryzen 3600 and Threadripper 3975WX) than on Windows on the same CPUs.
On Intel CPUs it's close to equally fast on every OS.
This only happens on x86_64, when compiled for i386 there is no difference in performance.

Digging into the RTL, on windows it's using fpc_frac_real that is outside FPC_HAS_TYPE_EXTENDED ifdef (in rtl/x86_64/math.inc), which is double SSE code similar to the frac_sse of my example bit.
While on Linux it is using the one inside it, which is extended x87 fistpq code.

So it comes down to Windows not supporting extended type and thus getting a double SSE frac implementation, while Linux does support extended type, and thus is using extended x87 frac implementation.
And as far as i can find, AMD's implementation of old 80bit FPU operations is MUCH, MUCH slower than Intel's.

Given that frac is a fairly basic function and AMD CPUs are rapidly gaining popularity, this is a rather critical issue.

Steps to reproduce:

Run this code

//############################################################################//
{$ifdef mswindows}{$apptype console}{$endif}
program frac_tst;
//############################################################################//
function frac_sse(const d:double):double;assembler;nostackframe;
asm
 movq %xmm0, %rax
 shr $48, %rax
 and $0x7ff0,%ax
 cmp $0x4330,%ax
 jge .L0
 cvttsd2si %xmm0, %rax
 cvtsi2sd %rax, %xmm4
 subsd %xmm4, %xmm0
 ret
.L0:
 xorpd %xmm0, %xmm0
end;
//############################################################################//
procedure main;
var x:double;
i:integer;
begin
 write('System: ');
 x:=0;
 for i:=0 to 9999999 do x:=x+frac(i/10);
 writeln(x:3:3);

 write('Custom: ');
 x:=0;
 for i:=0 to 9999999 do x:=x+frac_sse(i/10);
 writeln(x:3:3);
end;
//############################################################################//
begin
 main;
 {$ifdef mswindows}readln;{$endif}
end.
//############################################################################//

Tweak iteration count for the run time to be noticeable, observe time difference between platforms and between system and custom frac.

On Intel or Windows both would be fast.
On AMD Linux, system one would be 10-20 times slower.

Mantis conversion info:

  • Mantis ID: 39275
  • Version: 3.3.1
  • Monitored by: » @Alexey-T1 (CudaText man)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking