Major performance regression for 10-bit encoding on Windows

I was testing the efficiency changes going from SVT-AV1 v1.1.0 to SVT-AV1 v1.2.1 on Windows, and found that for some reason the 10-bit encoding took a major hit in performance. I did a git bisect to figure out which commit this occurred in, and found that it regressed in commit 1d842ee6 as part of PR !1947 (merged).

From my testing, I found that 8-bit encoding stayed the same performance, but 10-bit encoding dropped by as much as one-third of what it once was before the commit depending on the preset used. I tested the presets from 6 to 12 on an AMD Ryzen 9 5950X, using both 8-bit and 10-bit encoding for an 8-bit source; I lack a 10-bit source to test with. I checked with both tune=0 and tune=1 and found similar performance regressions, so I'm only showing the results for tune=0 here.

I didn't test any preset below 6 for the sake of time, so I'm not sure if this behavior is reflected in slower presets. I assume it would be just as much of hit. I compiled targeting the Release build using Visual Studio 2022, though I was also able to reproduce this behavior using libsvtav1 in FFmpeg v5.1.1 via the gyan.dev Windows builds.

I also tested this in Linux using WSL Ubuntu 20.04, and I didn't see the same performance hit between bit depths. So to be clear, this seems to be an issue exclusive to Windows. The source doesn't seem to matter, but if you want to specifically use mine, you can download it here.

8-bit Command: ffmpeg -i G:\lossless\tf2_lossless.avi -nostdin -f yuv4mpegpipe - | .\SvtAv1EncApp.exe -i stdin --input-depth 8 --preset 7 --crf 14 --tune 0

8-Bit Example Output:

Svt[info]: Number of logical cores available: 32
Svt[info]: Number of PPCS 88
Svt[info]: [asm level on system : up to avx2]
Svt[info]: [asm level selected : up to avx2]
Svt[info]: -------------------------------------------
Svt[info]: SVT [config]: main profile   tier (auto)     level (auto)
Svt[info]: SVT [config]: width / height / fps numerator / fps denominator               : 1920 / 1080 / 60 / 1
Svt[info]: SVT [config]: bit-depth / color format / compressed 10-bit format            : 8 / YUV420 / 0
Svt[info]: SVT [config]: preset / tune / pred struct                                    : 7 / VQ / random access
Svt[info]: SVT [config]: gop size / mini-gop size / key-frame type                      : 321 / 16 / key frame
Svt[info]: SVT [config]: BRC mode / rate factor                                         : CRF / 14
Svt[info]: -------------------------------------------

SUMMARY --------------------------------- Channel 1  --------------------------------
Total Frames            Frame Rate              Byte Count              Bitrate
        1801            60.00 fps                196025484              52244.44 kbps

Channel 1
Average Speed:          30.016 fps
Total Encoding Time:    60001 ms
Total Execution Time:   61001 ms
Average Latency:        3153 ms
Max Latency:            5000 ms

10-bit Command: ffmpeg -i G:\lossless\tf2_lossless.avi -nostdin -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 - | .\SvtAv1EncApp.exe -i stdin --input-depth 10 --preset 7 --crf 14 --tune 0

10-Bit Example Output:

Svt[info]: Number of logical cores available: 32
Svt[info]: Number of PPCS 88
Svt[info]: [asm level on system : up to avx2]
Svt[info]: [asm level selected : up to avx2]
Svt[info]: -------------------------------------------
Svt[info]: SVT [config]: main profile   tier (auto)     level (auto)
Svt[info]: SVT [config]: width / height / fps numerator / fps denominator               : 1920 / 1080 / 60 / 1
Svt[info]: SVT [config]: bit-depth / color format / compressed 10-bit format            : 10 / YUV420 / 0
Svt[info]: SVT [config]: preset / tune / pred struct                                    : 7 / VQ / random access
Svt[info]: SVT [config]: gop size / mini-gop size / key-frame type                      : 321 / 16 / key frame
Svt[info]: SVT [config]: BRC mode / rate factor                                         : CRF / 14
Svt[info]: -------------------------------------------

SUMMARY --------------------------------- Channel 1  --------------------------------
Total Frames            Frame Rate              Byte Count              Bitrate
        1801            60.00 fps                205898009              54875.65 kbps

Channel 1
Average Speed:          10.118 fps
Total Encoding Time:    178000 ms
Total Execution Time:   178001 ms
Average Latency:        9006 ms
Max Latency:            14000 ms

Commit 0ba91fc4 (the last known good commit) Results:

Preset Bit Depth FPS
6 8 15.526
6 10 14.408
7 8 30.525
7 10 26.88
8 8 66.703
8 10 58.094
9 8 100.052
9 10 78.304
10 8 128.642
10 10 100.055
11 8 138.536
11 10 112.561
12 8 180.088
12 10 138.528

Commit 1d842ee6 (the first known bad commit) Results:

Preset Bit Depth FPS
6 8 15.263
6 10 5.772
7 8 31.051
7 10 10.175
8 8 66.703
8 10 27.707
9 8 100.053
9 10 26.485
10 8 128.641
10 10 33.981
11 8 138.535
11 10 60.033
12 8 180.087
12 10 94.789

Here's a bar graph that shows the relative performance change for each preset and each 10-bit, in other words new FPS / old FPS.

Bar Graph showing the relative performance changes over the two commits for 8-bit vs. 10-bit encoding, for presets 6 through 12.

Edited by Christopher Robert Philabaum