Detect "effectively inner/outer" chipping in TensorChipping
We already have an optimization for inner/outer chipping where we can load data from the underlying tensor using stride manipulation. Extend it to detect "effectively inner/outer" chipping where product of dimensions is 1
and they don't affect strides.