Speed up handling of local buffers in build_inter_predictors_8x8_and_bigger()
This MR avoids the memset corresponds to the MV offsets (i.e., vx/vy buffers) as the population and utilization happens only for the required sub-blocks. Also, the temporary buffers used to store MV offsets, gradient information and OPFL prediction data are moved to heap memory from stack.
The encode time reduction:
- For GCC compiler, overall 1.247%. 1.776% for Qp110 & 135.
- For CLANG compiler, <= 0.5%.
Decode time reduction of ~18% and ~4% is seen for GCC and CLANG compilers across all test-sets.
No stats change.