Major performance regression for 10-bit encoding on Windows
I was testing the efficiency changes going from SVT-AV1 v1.1.0 to SVT-AV1 v1.2.1 on Windows, and found that for some reason the 10-bit encoding took a major hit in performance. I did a git bisect to figure out which commit this occurred in, and found that it regressed in commit 1d842ee6 as part of PR !1947 (merged).
From my testing, I found that 8-bit encoding stayed the same performance, but 10-bit encoding dropped by as much as one-third of what it once was before the commit depending on the preset used. I tested the presets from 6 to 12 on an AMD Ryzen 9 5950X, using both 8-bit and 10-bit encoding for an 8-bit source; I lack a 10-bit source to test with. I checked with both tune=0 and tune=1 and found similar performance regressions, so I'm only showing the results for tune=0 here.
I didn't test any preset below 6 for the sake of time, so I'm not sure if this behavior is reflected in slower presets. I assume it would be just as much of hit. I compiled targeting the Release build using Visual Studio 2022, though I was also able to reproduce this behavior using libsvtav1 in FFmpeg v5.1.1 via the gyan.dev Windows builds.
I also tested this in Linux using WSL Ubuntu 20.04, and I didn't see the same performance hit between bit depths. So to be clear, this seems to be an issue exclusive to Windows. The source doesn't seem to matter, but if you want to specifically use mine, you can download it here.
8-bit Command: ffmpeg -i G:\lossless\tf2_lossless.avi -nostdin -f yuv4mpegpipe - | .\SvtAv1EncApp.exe -i stdin --input-depth 8 --preset 7 --crf 14 --tune 0
8-Bit Example Output:
Svt[info]: Number of logical cores available: 32
Svt[info]: Number of PPCS 88
Svt[info]: [asm level on system : up to avx2]
Svt[info]: [asm level selected : up to avx2]
Svt[info]: -------------------------------------------
Svt[info]: SVT [config]: main profile tier (auto) level (auto)
Svt[info]: SVT [config]: width / height / fps numerator / fps denominator : 1920 / 1080 / 60 / 1
Svt[info]: SVT [config]: bit-depth / color format / compressed 10-bit format : 8 / YUV420 / 0
Svt[info]: SVT [config]: preset / tune / pred struct : 7 / VQ / random access
Svt[info]: SVT [config]: gop size / mini-gop size / key-frame type : 321 / 16 / key frame
Svt[info]: SVT [config]: BRC mode / rate factor : CRF / 14
Svt[info]: -------------------------------------------
SUMMARY --------------------------------- Channel 1 --------------------------------
Total Frames Frame Rate Byte Count Bitrate
1801 60.00 fps 196025484 52244.44 kbps
Channel 1
Average Speed: 30.016 fps
Total Encoding Time: 60001 ms
Total Execution Time: 61001 ms
Average Latency: 3153 ms
Max Latency: 5000 ms
10-bit Command: ffmpeg -i G:\lossless\tf2_lossless.avi -nostdin -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 - | .\SvtAv1EncApp.exe -i stdin --input-depth 10 --preset 7 --crf 14 --tune 0
10-Bit Example Output:
Svt[info]: Number of logical cores available: 32
Svt[info]: Number of PPCS 88
Svt[info]: [asm level on system : up to avx2]
Svt[info]: [asm level selected : up to avx2]
Svt[info]: -------------------------------------------
Svt[info]: SVT [config]: main profile tier (auto) level (auto)
Svt[info]: SVT [config]: width / height / fps numerator / fps denominator : 1920 / 1080 / 60 / 1
Svt[info]: SVT [config]: bit-depth / color format / compressed 10-bit format : 10 / YUV420 / 0
Svt[info]: SVT [config]: preset / tune / pred struct : 7 / VQ / random access
Svt[info]: SVT [config]: gop size / mini-gop size / key-frame type : 321 / 16 / key frame
Svt[info]: SVT [config]: BRC mode / rate factor : CRF / 14
Svt[info]: -------------------------------------------
SUMMARY --------------------------------- Channel 1 --------------------------------
Total Frames Frame Rate Byte Count Bitrate
1801 60.00 fps 205898009 54875.65 kbps
Channel 1
Average Speed: 10.118 fps
Total Encoding Time: 178000 ms
Total Execution Time: 178001 ms
Average Latency: 9006 ms
Max Latency: 14000 ms
Commit 0ba91fc4 (the last known good commit) Results:
| Preset | Bit Depth | FPS |
|---|---|---|
| 6 | 8 | 15.526 |
| 6 | 10 | 14.408 |
| 7 | 8 | 30.525 |
| 7 | 10 | 26.88 |
| 8 | 8 | 66.703 |
| 8 | 10 | 58.094 |
| 9 | 8 | 100.052 |
| 9 | 10 | 78.304 |
| 10 | 8 | 128.642 |
| 10 | 10 | 100.055 |
| 11 | 8 | 138.536 |
| 11 | 10 | 112.561 |
| 12 | 8 | 180.088 |
| 12 | 10 | 138.528 |
Commit 1d842ee6 (the first known bad commit) Results:
| Preset | Bit Depth | FPS |
|---|---|---|
| 6 | 8 | 15.263 |
| 6 | 10 | 5.772 |
| 7 | 8 | 31.051 |
| 7 | 10 | 10.175 |
| 8 | 8 | 66.703 |
| 8 | 10 | 27.707 |
| 9 | 8 | 100.053 |
| 9 | 10 | 26.485 |
| 10 | 8 | 128.641 |
| 10 | 10 | 33.981 |
| 11 | 8 | 138.535 |
| 11 | 10 | 60.033 |
| 12 | 8 | 180.087 |
| 12 | 10 | 94.789 |
Here's a bar graph that shows the relative performance change for each preset and each 10-bit, in other words new FPS / old FPS.
