Use 64-bit indices for row offsets in sparse matrices

Standard practice is to configure --with-64-bit-indices when the global problem size exceeds 2B. This incurs a significant run-time cost because all column indices become 64-bit, thereby increasing memory bandwidth for sizeof(PetscInt)+sizeof(PetscScalar).

There is another mode in which 64-bit indexing is required: when the number of matrix entries on a single MPI rank exceeds 2B. This is increasingly common with fatter nodes and more interest in coprocessor and threading within MPI ranks. This scenario is less obvious and controllable for users (the matrix might be a result of sparse matrix products or factorization with fill) and can be alleviated by storing only row offsets as 64-bit, thereby avoiding the main run-time overhead of 64-bit column indices and need for user code to handle 64-bit indices. This can be implemented by storing row starts (the a->i arrays) as size_t or (if a signed variant is desired) ptrdiff_t. (I'm not aware of any current use for signed integers in this context, but downward loop indexing is cleaner with signed arithmetic.)

Edited Sep 14, 2020 by Jed Brown