Skip to content

Gemv microoptimization

Reference issue

What does this implement/fix?

Explicitly defining the loop bounds for the unrolled stages that increment by PacketSize fixes aggressive loop optimization compiler warnings. I learned this trick to minimize the overhead of rounding down the nearest power of two. Dividing and multiplying by a compile-time power of two entails a left and right shift. This can be further optimized to a single bitwise and.

Normally this optimization is automatically applied by the compiler -- if the type is an unsigned integer. Index is a signed integer, so the compiler plays it safe. Our indices are always non-negative, so we can skip this check.

https://godbolt.org/z/a6drKb6W8

I wanted to address this fix before cherry picking it to 3.4.

Additional information

Edited by Charles Schlosser

Merge request reports

Loading