Invert rows and depth in non-vectorized portion of packing (PowerPC).
Invert rows and depth in non-vectorized portion of packing for RHS (PowerPC).
This shows up as bad results in the following:
export EIGEN_SEED=1629216664
test/product_syrk_3
test/product_mmtr_3
The previous packing did NOT allow us to know the correct end of a row in some cases and it would pickup incorrect values from the wrong locations.
In the process of fixing this, I simplified the code and added performance improvements (extra rows are now 5X faster and overall 10% gains).