Fix MSVC+CUDA issues.
Darn MSVC+CUDA gets confused for diagonal and transpose again, not able
to match out-of-line definitions with the corresponding declarations.
Removed Modified internal typedefs and just use the true type to get around this.
MSVC also complained about not passing enough arguments to function-like macro, and about invalid friend declarations. Removed unused macro argument, and explicitly specified friend classes to get around these.
Edited by Antonio Sánchez