Modify tensor argmin/argmax to always return first occurence.
As written, depending on multithreading/gpu, the returned index from
argmin/argmax is not currently stable. Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).
This is otherwise causing unpredictable results in some TF tests.