Further parallelize image prebuilding
What does this MR do?
Further parallelize image prebuilding to speed up the CI, and only depend on the x86 images for tests so they can start earlier.
Why was this MR needed?
Performance
What's the best way to test this MR?
N/A