1. 30 Apr, 2017 1 commit
    • Ondrej Mosnáček's avatar
      Use more efficient OpenCL memory mapping · dde1302f
      Ondrej Mosnáček authored
      This commit changes the memory layout to be lane-interleaved, so that
      we can map only the necessary parts of GPU memory with OpenCL.
      Previously we mapped the whole memory buffer, which was slow as hell...
      dde1302f
  2. 13 Mar, 2017 1 commit
    • Ondrej Mosnáček's avatar
      Tune both lanes and jobs per block · 2c505e66
      Ondrej Mosnáček authored
      This commit extends block size tuning to both lanes and jobs (for oneshot
      kernel, only jobs per block are tuned). For now, we only tune jobs per block
      if lanes per block was tuned to its maximum value.
      2c505e66
  3. 11 Mar, 2017 2 commits
  4. 27 Jan, 2017 1 commit