sve instrinsics with "_x" suffix will be faster than "_z" suffix
Reference issue
What does this implement/fix?
This MR replace the sve instrinsics with "_z" suffix by that with "_x" suffix for a better performance. The different between the sve instrinsics with "_z" suffix and instrinsics with "_x" suffix is: When the predicate register is false, the corresponding location of res will be set 0 with "_z", while do nothing with "_x". For compiler, it will cause an extra "SEL" instrinsics with "_z" and an extra register to save 0.
Additional information
For example, "svadd" will be translate to "ADD ., /M, ., .", so svadd_z need to add "SEL" between Zdn and 0, because it does not support /Z.
Maybe I can provide some code to illustrate it more clearly.