Add concurrent memspeed test
This is a trimmed version of the original memspeed test. This test supports concurrent execution of copy events defined by the input text file. The implementation of the file format (CopyEvent) is very naive but enables to stack a bunch of copies together. There are some limitations in terms of how many dimensions we can handle at once..etc In addition, we can consider merging this test with the original memspeed. At this point, it has only been tested on multi-node GPU-to-GPU scenarious.
: Manual testing
Edited by apryakhin