1. 26 Mar, 2019 6 commits
  2. 19 Mar, 2019 1 commit
    • Ondrej Mosnáček's avatar
      [CUDA,OpenCL] Use RAM buffer for in/out blocks · 58141506
      Ondrej Mosnáček authored
      This allows to do the CPU pre-/post-processing of a password batch to
      be done in parallel with the GPU computation. This means we can now
      assume the BLAKE2 computation cost to be hidden behind the GPU
      computation time (for real).
      
      This only adds the overhead of copying the data from/to the RAM buffer
      to the GPU computation time, but this is fast thanks to the rectangular
      copy operations that are used. This should significantly affect only
      hashes with low cost parameters. For these the benchmark tool was
      reporting too optimistic times before this commit.
      58141506
  3. 19 Dec, 2017 3 commits
  4. 19 Oct, 2017 1 commit
  5. 18 Oct, 2017 1 commit
  6. 17 Oct, 2017 2 commits
  7. 10 Oct, 2017 2 commits
  8. 07 Oct, 2017 2 commits
  9. 03 Oct, 2017 7 commits
  10. 29 Sep, 2017 6 commits
  11. 25 Sep, 2017 1 commit
  12. 21 Sep, 2017 5 commits
  13. 18 Sep, 2017 1 commit
    • Ondrej Mosnáček's avatar
      [CUDA] Do not use UVM · 74cfdfa1
      Ondrej Mosnáček authored
      Since using UVM may cause severe performance problems on multi-GPU
      systems, it is better to replace it with manual copying from CPU
      memory.
      74cfdfa1
  14. 14 Sep, 2017 1 commit
  15. 13 Sep, 2017 1 commit