The source project of this merge request has been removed.
Add half and quarter vector support to HVX architecture
Reference issue
What does this implement/fix?
Since HVX uses 128-byte (1024-bit) vector register, full-size vectorization cannot benefit data size less than 128-byte. This change allows to use half and quarter of the HVX vector for vectorization through the "half" type in "packet_traits" and "unpacket_traits". For small matrix multiplication (matrix size ranging from 8 to 31 elements of single precision float), this change can get 1.37X-3.1X speedup on Snapdragon XR2 Gen 2.
Additional information
The change is only on HVX architecture specific packet math file which should not impact other architectures.
Edited by Cheng Wang