arm packet alignment requirements and aligned loads/stores

Reference issue

What does this implement/fix?

  1. The alignment requirements for some arm simd vectors are too strict.
  2. Arm does not provide intrinsics for aligned loads and stores. For arm32, we can provide an alignment hint which generates the aligned instructions. Arm64 appears to ignore these hints.

https://godbolt.org/z/6dd33M4Wq

Can anyone benchmark this?

Additional information

Merge request reports

Loading