sve instrinsics with "_x" suffix will be faster than "_z" suffix

Reference issue

What does this implement/fix?

This MR replace the sve instrinsics with "_z" suffix by that with "_x" suffix for a better performance. The different between the sve instrinsics with "_z" suffix and instrinsics with "_x" suffix is: When the predicate register is false, the corresponding location of res will be set 0 with "_z", while do nothing with "_x". For compiler, it will cause an extra "SEL" instrinsics with "_z" and an extra register to save 0.

Additional information

For example, "svadd" will be translate to "ADD ., /M, ., .", so svadd_z need to add "SEL" between Zdn and 0, because it does not support /Z.

Maybe I can provide some code to illustrate it more clearly.

Merge request reports

Loading