Streaming API with async/.await

https://github.com/rust-accel/accel/issues/65

Wrap CUDA steraming API using async/await

TODO

Problems

How to handle memories used in streams?

let a = vec![1,2,3];
let mem = DeviceMemory::zeros(3);
mem.copy_from_stream(&a, &mut stream); // will wait until previous jobs in streams
drop(a);                               // a drops before memcpy starts?

from #41 (comment 331562896)

Edited by termoshtt