Skip to content

Performance improvements (general) and AVX-512 support (B3/S23)

Adam P. Goucher requested to merge avx512 into master

Soup-searching speed is 4% faster on an AVX2 machine (3875 --> 4025 soups/second) and 40% faster on an AVX-512 machine (3600 --> 5040 soups/second). For AVX-512, we compute GoL using just 2 binary operations and 7 ternary operations (per cell, amortized) and use shifting/shuffling instructions effectively. After profiling, it appears that on AVX-512 the memory latency is a significant factor now that the operation count is so low. To address this, running larger tiles for more generations may be desirable.

Merge request reports