Commits on Source (15)
-
The high bitdepth loopfilter functions are specialized on certain filtering scenarios where the value of the masks indicates there is no need to compute all filter sizes. This optimization will also be added to the standard bit-depth implementation in future commits. The inputs of the unit test cover only a very reduced set of these scenarios, so use the function from the libaom loopfilter unit tests to generate inputs that will cover all filtering cases.
-
The Neon implementations of svt_aom_lpf_vertical_6 and svt_aom_lpf_horizontal_6 compute both filter4() and filter6() before selecting for each element which filter is actually needed. In practice, however, a lot of cases only need one of the filters, so specialize for these scenarios, computing only the filters that are needed and eliminate bitwise select. This makes the case where all filters are needed slightly slower, but as it is far from the most common case this is ok. Also move the actual filter computation to separate functions to avoid code duplication.
-
The Neon implementations of svt_aom_lpf_vertical_8 and svt_aom_lpf_horizontal_8 compute both filter4() and filter8() before selecting for each element which filter is actually needed. In practice, however, a lot of cases only need one of the filters, so specialize for these scenarios, computing only the filters that are needed and eliminate bitwise select. This makes the case where all filters are needed slightly slower, but as it is far from the most common case this is ok. Also move the actual filter computation to separate functions to avoid code duplication.
-
The Neon implementations of svt_aom_lpf_vertical_14 and svt_aom_lpf_horizontal_14 compute every filter (filter4(), filter8() and filter14()) before selecting for each element which filter is actually needed. In practice, however, a lot of cases only need one of the filters, so specialize for these scenarios, computing only the filters that are needed and eliminate bitwise select. This makes the case where all filters are needed slightly slower, but as it is far from the most common case this is ok. Also move the actual filter computation to separate functions to avoid code duplication.
-
Use the existing helper to compute filter4() and return if filtering isn't needed.
-
Fix the condition for the filter8_only case for svt_aom_highbd_lpf_horizontal_14_neon and svt_aom_highbd_lpf_vertical_14_neon. Also take the opportunity to make other if conditions slightly nicer, avoiding the use of horizontal adds completely.
-
CFL prediction can only happen on blocks where max(width, height) <= 32. Additionnally svt_cfl_predict_lbd and svt_cfl_predict_hbd can only take values of width coming from the tx_size_wide[] array, which doesn't include width = 2, which the SIMD implementations don't cater for anyway. Skip all the invalid block sizes in the unit tests.
-
Interleaving is not necessary and interleaving load/stores can be very slow. vst1/vld1_x2/x3/4 intrinsics are now supported by all modern compilers.
-
Port from libaom the Neon implementation of svt_cfl_predict_hbd and add the corresponding unit tests.
-
Port from libaom the Neon implementation of svt_av1_upsample_intra_edge and add the corresponding unit tests.
-
Refactor SelfGuidedUtilTest.cc by having one class per function tested and having one test instantiation per architecture extension. Add testcases for different values of width due to adding a Neon implementation in a subsequent patch (which only supports width % 8 == 0).
-
Port the libaom Neon implementation of svt_get_proj_subspace and add the unit tests.
-
CFL prediction can only happen on blocks where max(width, height) <= 32, so skip all the invalid block sizes.
-
Add a Neon implementation of svt_cfl_luma_subsampling_lbd, and add the unit tests.
-
Add a Neon implementation of svt_cfl_luma_subsampling_420_hbd and add the unit tests.
Showing
- Source/Lib/ASM_NEON/CMakeLists.txt 1 addition, 0 deletionsSource/Lib/ASM_NEON/CMakeLists.txt
- Source/Lib/ASM_NEON/cfl_neon.c 209 additions, 10 deletionsSource/Lib/ASM_NEON/cfl_neon.c
- Source/Lib/ASM_NEON/deblocking_filter_intrinsic_neon.c 344 additions, 402 deletionsSource/Lib/ASM_NEON/deblocking_filter_intrinsic_neon.c
- Source/Lib/ASM_NEON/highbd_loopfilter_neon.c 15 additions, 9 deletionsSource/Lib/ASM_NEON/highbd_loopfilter_neon.c
- Source/Lib/ASM_NEON/intra_prediction_neon.c 40 additions, 0 deletionsSource/Lib/ASM_NEON/intra_prediction_neon.c
- Source/Lib/ASM_NEON/mem_neon.h 14 additions, 0 deletionsSource/Lib/ASM_NEON/mem_neon.h
- Source/Lib/ASM_NEON/restoration_pick_neon.c 551 additions, 0 deletionsSource/Lib/ASM_NEON/restoration_pick_neon.c
- Source/Lib/Codec/aom_dsp_rtcd.c 1 addition, 1 deletionSource/Lib/Codec/aom_dsp_rtcd.c
- Source/Lib/Codec/aom_dsp_rtcd.h 1 addition, 1 deletionSource/Lib/Codec/aom_dsp_rtcd.h
- Source/Lib/Codec/common_dsp_rtcd.c 4 additions, 4 deletionsSource/Lib/Codec/common_dsp_rtcd.c
- Source/Lib/Codec/common_dsp_rtcd.h 5 additions, 0 deletionsSource/Lib/Codec/common_dsp_rtcd.h
- test/DeblockTest.cc 63 additions, 8 deletionstest/DeblockTest.cc
- test/SelfGuidedUtilTest.cc 159 additions, 216 deletionstest/SelfGuidedUtilTest.cc
- test/intrapred_cfl_test.cc 34 additions, 13 deletionstest/intrapred_cfl_test.cc
- test/intrapred_edge_filter_test.cc 5 additions, 1 deletiontest/intrapred_edge_filter_test.cc
This diff is collapsed.
Source/Lib/ASM_NEON/restoration_pick_neon.c
0 → 100644
This diff is collapsed.