Allocating SIMD-aligned arrays
@tenzing, it would be cool if we could allocate SIMD-aligned SharedArrays. We're doing a lot of FFTs in some instances, and apparently FFTW can be significantly faster on aligned data.
I think this simply means that the starting address must be a multiple of 16/32/64 (still need to figure out how to detect this on a given architecture...) I guess we could add on 16/32/64 to the size here, and then nudge the pointer around to make sure the array data is aligned?