NEON port of svt_av1_highbd_convolve_2d_sr_c
Description
Issue
NEON port of svt_av1_highbd_convolve_2d_sr_c from the SSSE3 implementation.
The system under test is c7g.4xlarge AWS graviton instance.
The measurements below were obtained as follows:
wget wget http://ultravideo.fi/video/Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
7z x Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
SvtAv1EncApp -i Bosphorus_3840x2160.y4m --crf [C] --preset [P]
Improvement:
filename | preset | crf | e2e 3x encoding time | ||
---|---|---|---|---|---|
After | Before | Improv. | |||
Bosphorus_3840x2160.y4m' | 4 | 30 | 147.18 | 146.46 | -0.49% |
Bosphorus_3840x2160.y4m' | 6 | 30 | 51.38 | 51.15 | -0.45% |
Bosphorus_3840x2160.y4m' | 8 | 30 | 24.41 | 24.20 | -0.86% |
Bosphorus_3840x2160.y4m' | 10 | 30 | 11.34 | 11.28 | -0.53% |
Bosphorus_3840x2160.y4m' | 12 | 30 | 7.60 | 7.57 | -0.39% |
Bosphorus_3840x2160.y4m' | 13 | 30 | 7.61 | 7.59 | -0.26% |
Bosphorus_1920x1080.y4m' | 4 | 30 | 70.08 | 70.23 | 0.21% |
Bosphorus_1920x1080.y4m' | 6 | 30 | 25.89 | 26.03 | 0.54% |
Bosphorus_1920x1080.y4m' | 8 | 30 | 13.12 | 13.08 | -0.30% |
Bosphorus_1920x1080.y4m' | 10 | 30 | 5.75 | 5.71 | -0.70% |
Bosphorus_1920x1080.y4m' | 12 | 30 | 2.90 | 2.89 | -0.34% |
Bosphorus_1920x1080.y4m' | 13 | 30 | 2.42 | 2.39 | -1.24% |
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' | 4 | 30 | 97.89 | 99.62 | 1.77% |
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' | 6 | 30 | 96.43 | 98.57 | 2.22% |
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' | 8 | 30 | 97.49 | 98.59 | 1.13% |
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' | 10 | 30 | 26.12 | 26.23 | 0.42% |
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' | 12 | 30 | 14.07 | 14.28 | 1.49% |
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' | 13 | 30 | 14.96 | 15.08 | 0.80% |
jellyfish-1080p-hevc-10bit.yuv' | 4 | 30 | 75.99 | 78.36 | 3.12% |
jellyfish-1080p-hevc-10bit.yuv' | 6 | 30 | 76.30 | 77.98 | 2.20% |
jellyfish-1080p-hevc-10bit.yuv' | 8 | 30 | 76.69 | 78.59 | 2.48% |
jellyfish-1080p-hevc-10bit.yuv' | 10 | 30 | 21.71 | 22.21 | 2.30% |
jellyfish-1080p-hevc-10bit.yuv' | 12 | 30 | 14.21 | 14.40 | 1.34% |
jellyfish-1080p-hevc-10bit.yuv' | 13 | 30 | 12.63 | 12.74 | 0.87% |
Author(s)
Rodrigo Causarano ( @rjcausarano) Gerardo Puga ( @glpuga)
Performance impact
-
quality -
memory -
speed -
8 bit -
10 bit -
N/A
Test set
-
obj-1-fast can be found here -
other -
N/A
Merge method
-
Allow the maintainer to squash and merge when PR is ready to create a 1-commit to the master branch. The maintainer will be able to fix typos / combine commit messages to create a more readable 1-commit message or use whatever is stated in the 'Description' section -
I will clean up my commits and the maintainer shall use 'rebase and merge' to the master branch