Clang Performance Fix
Hi, Last night I saw your post on ycombinator and I was really surprised to see that Clang took about 12 hours to complete the entire test suite whereas GCC was about 8 hours. I knew something was up because in my experience Clang's virtual function implementation is about 20% faster than GCC's. So I decided to dig in and found out that Clang's `uniform_real_distribution` was the main cause behind the slowdown. These findings are unrelated to the `final` keyword being present or not. Clang had a lot more calls to `libm` compared to GCC when ran through perf, with the majority of the time spent calculating `logl`. And suspiciously `_GeneralizedRandomGenerator` showing up there. ![image.png](/uploads/2422be25635f7746e424b67f51d7d2ce/image.png){width="919" height="147"} GCC's perf chart: ![image.png](/uploads/b1d77423b5c594f4c723aa3922f828fc/image.png){width="968" height="71"} Looking through Clang's decompiled binary, all of the functions which have a call to `RandomGenerator.get_real()` were calling `logl`. Some Googling around and I found some other people experiencing the same thing: https://old.reddit.com/r/cpp/comments/17ite2b/highperformance_alternative_to_stduniform_real/k6x6d6a/ I have no idea how to add a patch/commit to GitLab so I've uploaded it to pastebin if you want to further investigate: https://pastebin.com/ByCmu2J6 Here's the results I got from running `book2::final_scene` with default settings on a Ryzen 5 7600X running Arch Linux 6.8.7. Results were generated running: `time for i in $(seq 1 10); do ./build/PSRayTracing ; done` and using the time from the program. | Clang 17.0.6 unpatched | Clang 17.0.6 patched | GCC 13.2.1 unpatched | GCC 13.2.1 patched | |------------------------|----------------------|----------------------|--------------------| | 11.776 | 5.209 | 6.518 | 6.361 | | 11.725 | 5.211 | 6.509 | 6.363 | | 11.721 | 5.216 | 6.513 | 6.366 | | 11.775 | 5.212 | 6.516 | 6.367 | | 11.773 | 5.214 | 6.566 | 6.369 | | 11.872 | 5.211 | 6.518 | 6.37 | | 11.723 | 5.258 | 6.511 | 6.364 | | 11.771 | 5.216 | 6.513 | 6.364 | | 11.767 | 5.212 | 6.509 | 6.361 | | 11.72 | 5.211 | 6.513 | 6.367 | Clang's performance has doubled and beats GCC's by a solid second, and GCC has obtained a small performance increase of about 150ms. For reference, here's an image from Clang with my fix applied: ![image.png](/uploads/838ec0e2c16cea8bb28548ca260e08c9/image.png) And one from Clang without the patch: ![image.png](/uploads/7cfb5581549698ef41d0b4551f1c9093/image.png)
issue