Optimization the transformation in av1_fwd_txfm2d_64x64_avx2(),
Created by: spawlows
av1_fwd_txfm2d_32x64_avx2()
Implemented load from int16_array_with_stride_to_int32_array_without_stride() to AVX2, and other optimizations.
The increase in encoding speed was measured by ~ 1.4% in M6 Kernel Av1EstimateTransform take 7.1% insted of 8.5%