Skip to content

NEON ports of svt_av1_inv_txfm2d_add_16x16_c and svt_av1_inv_txfm2d_add_32x32_c and svt_av1_inv_txfm2d_add_64x64_c

Description

Issue

Adds NEON port of the svt_av1_inv_txfm2d_add_16x16_c and svt_av1_inv_txfm2d_add_32x32_c and svt_av1_inv_txfm2d_add_64x64_c functions based on SSE4.1.

svt_av1_inv_txfm2d_add_16x16_neon based on this commit

svt_av1_inv_txfm2d_add_32x32_neon based on this commit

svt_av1_inv_txfm2d_add_64x64_neon based on this commit

The system under test is c7g.4xlarge AWS graviton instance.

The combined effect of both functions were obtained as follows:

wget wget http://ultravideo.fi/video/Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
7z x Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z

SvtAv1EncApp -i Bosphorus_3840x2160.y4m --crf [C] --preset [P]

After

filename preset crf User time (seconds) System time (seconds) Percent of CPU this job got Maximum resident set size (kbytes) Average Speed Loaded runtime
Bosphorus_3840x2160.y4m' 4 30 726.67 5.43 1146.00% 5279508 2.388 146.81
Bosphorus_3840x2160.y4m' 6 30 250.52 2.45 1049.00% 5063080 6.507 51.46
Bosphorus_3840x2160.y4m' 8 30 116.82 1.64 1023.00% 4278196 13.999 24.32
Bosphorus_3840x2160.y4m' 10 30 52.41 1.57 1205.00% 3602100 39.897 11.31
Bosphorus_3840x2160.y4m' 12 30 33.97 1.32 1078.00% 3334380 57.25 7.6
Bosphorus_3840x2160.y4m' 13 30 34.05 1.24 1079.00% 3332672 57.315 7.56
Bosphorus_1920x1080.y4m' 4 30 349.93 1.96 1155.00% 2144340 9.997 70.15
Bosphorus_1920x1080.y4m' 6 30 125.86 1.17 1119.00% 2102704 27.535 25.62
Bosphorus_1920x1080.y4m' 8 30 62.68 0.94 962.00% 1924628 48.308 13.04
Bosphorus_1920x1080.y4m' 10 30 26.99 0.71 1216.00% 1146116 148.407 5.69
Bosphorus_1920x1080.y4m' 12 30 12.74 0.7 1019.00% 989744 273.317 2.84
Bosphorus_1920x1080.y4m' 13 30 10.40 0.59 982.00% 977680 329.371 2.43
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 4 30 482.32 0.86 734.00% 1387080 2.331 98.96
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 6 30 482.04 0.84 734.00% 1414476 2.333 97.87
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 8 30 482.46 0.74 735.00% 1396044 2.334 98.61
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 10 30 125.80 0.73 575.00% 1169184 6.995 26.22
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 12 30 52.49 0.56 408.00% 1137756 11.906 14.33
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 13 30 49.30 0.57 349.00% 1162180 10.817 15.01
jellyfish-1080p-hevc-10bit.yuv' 4 30 386.96 0.74 646.00% 457700 7.515 77.74
jellyfish-1080p-hevc-10bit.yuv' 6 30 386.94 0.6 647.00% 471080 7.522 77.83
jellyfish-1080p-hevc-10bit.yuv' 8 30 387.26 0.69 647.00% 458920 7.515 78.31
jellyfish-1080p-hevc-10bit.yuv' 10 30 107.94 0.84 613.00% 382488 25.433 22.08
jellyfish-1080p-hevc-10bit.yuv' 12 30 50.94 0.74 377.00% 363032 32.947 14.43
jellyfish-1080p-hevc-10bit.yuv' 13 30 43.99 0.67 364.00% 357908 36.883 12.75

Before

filename preset crf User time (seconds) System time (seconds) Percent of CPU this job got Maximum resident set size (kbytes) Average Speed Loaded runtime
Bosphorus_3840x2160.y4m' 4 30 722.04 5.68 1145.00% 5265168 2.4 146.4
Bosphorus_3840x2160.y4m' 6 30 250.26 2.75 1050.00% 4999256 6.512 51.06
Bosphorus_3840x2160.y4m' 8 30 116.70 1.56 1024.00% 4273956 14.02 24.48
Bosphorus_3840x2160.y4m' 10 30 52.52 1.49 1206.00% 3618116 39.747 11.28
Bosphorus_3840x2160.y4m' 12 30 33.78 1.49 1075.00% 3324704 57.04 7.58
Bosphorus_3840x2160.y4m' 13 30 33.84 1.39 1074.00% 3336404 57.224 7.57
Bosphorus_1920x1080.y4m' 4 30 348.91 1.98 1160.00% 2150908 10.066 70.27
Bosphorus_1920x1080.y4m' 6 30 125.61 1.32 1118.00% 2077612 27.569 25.74
Bosphorus_1920x1080.y4m' 8 30 62.53 1.08 961.00% 1921316 48.208 13.02
Bosphorus_1920x1080.y4m' 10 30 26.85 0.8 1201.00% 1132408 146.775 5.71
Bosphorus_1920x1080.y4m' 12 30 12.74 0.59 1018.00% 985352 276.139 2.83
Bosphorus_1920x1080.y4m' 13 30 10.32 0.65 984.00% 971432 331.657 2.4
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 4 30 540.85 0.87 748.00% 1414512 2.119 109.68
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 6 30 541.04 0.74 751.00% 1378556 2.125 109.74
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 8 30 541.09 0.76 751.00% 1409128 2.126 109.68
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 10 30 159.72 0.68 625.00% 1141892 5.99 32.79
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 12 30 77.25 0.44 496.00% 1138108 9.846 18.05
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 13 30 67.44 0.77 422.00% 1166720 9.535 17.4
jellyfish-1080p-hevc-10bit.yuv' 4 30 431.06 0.57 654.00% 469708 6.832 87
jellyfish-1080p-hevc-10bit.yuv' 6 30 430.56 0.66 653.00% 459800 6.83 87.12
jellyfish-1080p-hevc-10bit.yuv' 8 30 430.51 0.71 654.00% 470820 6.84 87.37
jellyfish-1080p-hevc-10bit.yuv' 10 30 139.99 0.60 648.00% 383564 20.805 28.66
jellyfish-1080p-hevc-10bit.yuv' 12 30 67.20 0.44 425.00% 362644 28.373 17.19
jellyfish-1080p-hevc-10bit.yuv' 13 30 55.27 0.46 418.00% 358100 33.878 14.36

Speedup

filename preset crf User time (seconds)
After Before Improv.
Bosphorus_3840x2160.y4m' 4 30 726.67 722.04 -0.64%
Bosphorus_3840x2160.y4m' 6 30 250.52 250.26 -0.10%
Bosphorus_3840x2160.y4m' 8 30 116.82 116.70 -0.10%
Bosphorus_3840x2160.y4m' 10 30 52.41 52.52 0.21%
Bosphorus_3840x2160.y4m' 12 30 33.97 33.78 -0.56%
Bosphorus_3840x2160.y4m' 13 30 34.05 33.84 -0.62%
Bosphorus_1920x1080.y4m' 4 30 349.93 348.91 -0.29%
Bosphorus_1920x1080.y4m' 6 30 125.86 125.61 -0.20%
Bosphorus_1920x1080.y4m' 8 30 62.68 62.53 -0.24%
Bosphorus_1920x1080.y4m' 10 30 26.99 26.85 -0.52%
Bosphorus_1920x1080.y4m' 12 30 12.74 12.74 0.00%
Bosphorus_1920x1080.y4m' 13 30 10.40 10.32 -0.77%
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 4 30 482.32 540.85 12.14%
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 6 30 482.04 541.04 12.24%
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 8 30 482.46 541.09 12.15%
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 10 30 125.80 159.72 26.96%
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 12 30 52.49 77.25 47.17%
jellyfish-400-mbps-4k-uhd-hevc-10bit.yuv' 13 30 49.30 67.44 36.80%
jellyfish-1080p-hevc-10bit.yuv' 4 30 386.96 431.06 11.40%
jellyfish-1080p-hevc-10bit.yuv' 6 30 386.94 430.56 11.27%
jellyfish-1080p-hevc-10bit.yuv' 8 30 387.26 430.51 11.17%
jellyfish-1080p-hevc-10bit.yuv' 10 30 107.94 139.99 29.69%
jellyfish-1080p-hevc-10bit.yuv' 12 30 50.94 67.20 31.92%
jellyfish-1080p-hevc-10bit.yuv' 13 30 43.99 55.27 25.64%

Author(s)

Rodrigo Causarano ( @rjcausarano) Gerardo Puga ( @glpuga)

Performance impact

  • quality
  • memory
  • speed
  • 8 bit
  • 10 bit
  • N/A

Test set

  • obj-1-fast can be found here
  • other
  • N/A

Merge method

  • Allow the maintainer to squash and merge when PR is ready to create a 1-commit to the master branch. The maintainer will be able to fix typos / combine commit messages to create a more readable 1-commit message or use whatever is stated in the 'Description' section
  • I will clean up my commits and the maintainer shall use 'rebase and merge' to the master branch
Edited by Rodrigo Causarano

Merge request reports