Adding basic skeleton to implement join_cols()
Based on the discussion on the IRC channel with @rcurtin and @zoq I've started implementing join_cols()
.
Plan is to:
- implementation of join_cols() using cudaMemcpy() (and the equivalent for OpenCL)
- write a custom two-way kernel for join_cols() for two objects of different types
- some quick benchmarking