Base kernel AVX code generation
Currently, each base kernel has a gencode
method that generates the CUDA C++ code. To accomodate both GPU/AVX generation, we will refactor this design to instead use of series of functions, i.e. gen_cuda
, gen_avx
etc.
Edited by Yu-Hang "Maxin" Tang