Clang Performance Fix
Hi,
Last night I saw your post on ycombinator and I was really surprised to see that Clang took about 12 hours to complete the entire test suite whereas GCC was about 8 hours. I knew something was up because in my experience Clang's virtual function implementation is about 20% faster than GCC's. So I decided to dig in and found out that Clang's uniform_real_distribution
was the main cause behind the slowdown. These findings are unrelated to the final
keyword being present or not.
Clang had a lot more calls to libm
compared to GCC when ran through perf, with the majority of the time spent calculating logl
. And suspiciously _GeneralizedRandomGenerator
showing up there.
GCC's perf chart:
Looking through Clang's decompiled binary, all of the functions which have a call to RandomGenerator.get_real()
were calling logl
. Some Googling around and I found some other people experiencing the same thing: https://old.reddit.com/r/cpp/comments/17ite2b/highperformance_alternative_to_stduniform_real/k6x6d6a/
I have no idea how to add a patch/commit to GitLab so I've uploaded it to pastebin if you want to further investigate: https://pastebin.com/ByCmu2J6
Here's the results I got from running book2::final_scene
with default settings on a Ryzen 5 7600X running Arch Linux 6.8.7.
Results were generated running: time for i in $(seq 1 10); do ./build/PSRayTracing ; done
and using the time from the program.
Clang 17.0.6 unpatched | Clang 17.0.6 patched | GCC 13.2.1 unpatched | GCC 13.2.1 patched |
---|---|---|---|
11.776 | 5.209 | 6.518 | 6.361 |
11.725 | 5.211 | 6.509 | 6.363 |
11.721 | 5.216 | 6.513 | 6.366 |
11.775 | 5.212 | 6.516 | 6.367 |
11.773 | 5.214 | 6.566 | 6.369 |
11.872 | 5.211 | 6.518 | 6.37 |
11.723 | 5.258 | 6.511 | 6.364 |
11.771 | 5.216 | 6.513 | 6.364 |
11.767 | 5.212 | 6.509 | 6.361 |
11.72 | 5.211 | 6.513 | 6.367 |
Clang's performance has doubled and beats GCC's by a solid second, and GCC has obtained a small performance increase of about 150ms.
For reference, here's an image from Clang with my fix applied:
And one from Clang without the patch: