do i = 1, n A(i) = sin(B(i)) end do
gets transformed into
do i = 1, n, 32 do j = 1, 32 A(i+j) = sin(B(i+j)) end do end do do i = n-modulo(n,32), n ! roughly A(i) = sin(B(i)) end do
do i = 1, n, 32 A(i:i+32) = sin(B(i:i+32)) end do
sin operates on a vector, instead of a scalar.
There are two sources of speedup:
- The tiling itself keeps the small size 32 array in L1 cache, thus faster access and operations
- A special function implementations, such as
sin, are always faster when operating on a vector, as one can use a more efficient implementation. So replacing scalar
sinwith a vector
sinwill be faster.