Fix memory leaks and bugs in MATAIJCUSPARSE
There are multiple bugs and memory leaks when we repeatedly assemble a matrix.
MATAIJCUSPARSE uses compressed-row CSR for MatMult, but uses non-compressed-row CSC (got from transpose of the matrix) for MatMultTranspose. It stores two copies of the matrix on GPU: one for MatMult, the other for MatMultTranspose. This choice made the code quite complex.
In repeated matrix assembly, we need to efficiently & correctly free or reuse existing data structures.
Edited by Junchao Zhang