Skip to content

realm: Refactor cuda allocation paths

Cory Perry US requested to merge cperry/shm-cuda2 into master
  • Refactor cuda allocation paths into GPUAllocation class
    • This allows us to abstract away the allocation logic from it's actual use
    • This allows us to unify the different allocation mechanism and take advantage of different sharing support
    • This allows us to explicitly define a lifetime and ownership for the allocations (based on the GPUAllocation object lifetime), preventing leaks and use-after-free issues
  • Refactor legacy cuda IPC paths to reduce the number of active messages and waiting (cutting the number of active messages down from 3 to 1 per node)
    • Previous implementation sent a request message before waiting for a receive message, then later needed to send a release message before exiting
    • This change instead has each node broadcast all handles asynchronously and then waits until all the ipc peers have sent their handles, with no release message necessacary
    • The active handler can be triggered early, before the cuda module is initialized, so added a wait for this initialization
    • Eventually this path will be replaced with the ipc_mailbox, removing active messages entirely
  • Refactor cuda error handling to use log_gpu instead.

Merge request reports