realm: Refactor cuda allocation paths
- Refactor cuda allocation paths into GPUAllocation class
- This allows us to abstract away the allocation logic from it's actual use
- This allows us to unify the different allocation mechanism and take advantage of different sharing support
- This allows us to explicitly define a lifetime and ownership for the allocations (based on the GPUAllocation object lifetime), preventing leaks and use-after-free issues
- Refactor legacy cuda IPC paths to reduce the number of active messages and waiting (cutting the number of active messages down from 3 to 1 per node)
- Previous implementation sent a request message before waiting for a receive message, then later needed to send a release message before exiting
- This change instead has each node broadcast all handles asynchronously and then waits until all the ipc peers have sent their handles, with no release message necessacary
- The active handler can be triggered early, before the cuda module is initialized, so added a wait for this initialization
- Eventually this path will be replaced with the ipc_mailbox, removing active messages entirely
- Refactor cuda error handling to use log_gpu instead.