WIP: CUDA implementation for computing a distance-1 parallel maximal independent sets
Current status:
- It works on SUMMIT.
- It does not work on systems that do not have cuda-aware MPI support (e.g. without openmpi), but merging with !2205 (merged) can make the MPI communication across GPUs fall back to using CPU as a hub.
Edited by Satish Balay