Cache compiled kernels
This MR adds support for caching compiled kernels, whether they are compiled with CUDA or OpenCL. By default the cache (on Linux) will attempt to use /var/cache/bandicoot
, followed by $HOME/.bandicoot/cache/
. OS X will use just $HOME/.bandicoot/cache/
. Kernels are cached per device, and are invalidated when the Bandicoot version number doesn't match.
A couple other minor changes here:
- Clean up some warnings.
- Use CUBIN instead of PTX for CUDA kernels. (This is a fully compiled binary kernel instead of an IR.)
I'll leave this open for a few days before merging, in case anyone has any comments.