Implement `find()`
find()
needed to be implemented as a two-stage operation:
- first, count the number of nonzeros so that we can allocate memory for the result
- second, populate the result
I implemented three second-stage kernels depending on whether we are finding all nonzeros, the first nonzeros, or the last nonzeros.
Behavior should match Armadillo's.