Skip to content

Add half and quarter vector support to HVX architecture

Reference issue

What does this implement/fix?

Since HVX uses 128-byte (1024-bit) vector register, full-size vectorization cannot benefit data size less than 128-byte. This change allows to use half and quarter of the HVX vector for vectorization through the "half" type in "packet_traits" and "unpacket_traits". For small matrix multiplication (matrix size ranging from 8 to 31 elements of single precision float), this change can get 1.37X-3.1X speedup on Snapdragon XR2 Gen 2.

Additional information

The change is only on HVX architecture specific packet math file which should not impact other architectures.

Edited by Cheng Wang

Merge request reports

Loading