Reshaping Expression in GPU Device Code
Describe the feature you would like to be implemented.
Be able to reshape an Eigen expression in GPU device code, e.g., product.reshaped(), where product is an Product expression.
Would such a feature be useful for other users? Why?
This would allow linear access on Eigen matrix expression in GPU code, where a thread perhaps only computes one element, and there's no need to force row-column indexing. Forcing row-column indexing may result in performance impact as we first need to compute index based on g.thread_rank() (in terms of cooperative groups).
Any hints on how to implement the requested feature?
The compiler suggests that a function named index_remap is host only.