arm packet alignment requirements and aligned loads/stores
Reference issue
What does this implement/fix?
- The alignment requirements for some arm simd vectors are too strict.
- Arm does not provide intrinsics for aligned loads and stores. For arm32, we can provide an alignment hint which generates the aligned instructions. Arm64 appears to ignore these hints.
https://godbolt.org/z/6dd33M4Wq
Can anyone benchmark this?