Add dot() operation. (!5) · Merge requests · bandicoot-lib / bandicoot-code

Ryan Curtin requested to merge rcurtin/bandicoot-code:dot into unstable Apr 18, 2020

This adds a reasonable but not great strategy for computing the dot() product, based on the existing two-stage accu(). It first performs a chunked dot product, then accumulates (with one thread) the results of that chunking. Really we should eventually implement a generic, efficient reduce and use that instead of the dot_twostage (and accu_twostage) kernels, but this works for now and should have reasonable performance.

This fixes #5 (closed).

Add dot() operation.

Merge request reports