Missing tests for non-square kernel transpose
Describe the feature you would like to be implemented.
ptranspose tests for non-square kernels
Would such a feature be useful for other users? Why?
Any hints on how to implement the requested feature?
Additional resources
Currently ptranpose is tested only with this piece of code:
internal::PacketBlock<Packet> kernel;
for (int i = 0; i < PacketSize; ++i) {
kernel.packet[i] = internal::pload<Packet>(data1 + i * PacketSize);
}
ptranspose(kernel);
....
This code creates only squared kernels and thus specializations for Nx4 kernels which are present in multiple architectures are not tested. This can be a problem when testing new architecture with a wrong implementation of a non-square kernel tranpose. packetmath tests will pass but product_small tests will fail for example which can be confusing. I volunteer to add this test.