Optimize division operations in TensorVolumePatch

Describe the feature you would like to be implemented.

Reduce one less division when generating a Packet given index in TensorVolumePatch.h

Would such a feature be useful for other users? Why?

Should reduce the number of CPU cycles given the division operation is expensive.

Any hints on how to implement the requested feature?

Additional resources