Add pload_partial, pstore_partial (and unaligned versions), pgather_partial, pscatter_partial, loadPacketPartial and storePacketPartial.
Add ploadN, pstoreN (and unaligned versions), pgatherN, pscatterN, loadPacketN and storePacketN.
Useful for:
- memory access - prevent reading/writing past end of data (only elements needed),
- performance - eliminates masking, one Packet vs N scalars, less complexity for edge condition functions/templates (better i-cache), etc.
- partial Packet operations - simplified Packet operations instead of read scalars, merge with Packet, operation, get scalar, write scalars.
- consistent results - reduces variations for scalar vs packet operations
Edited by Chip Kerchner