Optimization clamping kc to 360 or 240 is not justified by a comment, and is detrimental on Nexus 4
Submitted by Benoit Jacob
Assigned to Nobody
Link to original bugzilla bug (#939)
Description
computeProductBlockingSizes has this code limiting the value of the kc blocking parameter to 360, or 240 for big Scalar types:
k = std::min<SizeType>(k,sizeof(LhsScalar)<=4 ? 360 : 240);
The other blocking parameters mc and nc are also clamped, though to higher values.
This optimization should be justified by a comment. Please add one?
Moreover, this optimization is detrimental on a Nexus 4 (ARM) device. See the attachment in bug #937 comment 3. It shows that for large enough products, the optimal power-of-two value of kc can easily be 512, and for 1024^3 matrix products, kc=1024 or kc=512 both perform optimally, while kc<=256 performs at least 10% worse (see the bottow of that file for the 1024^3 case).
On a Core i7, the data in bug #937 comment 1 does confirm that kc=256 is the highest possible optimal power-of-two size. Still I would like to understand where the value 360 comes from?