cKDTree based NN removal for multiple order magnitude speed gain
For large noisy images, peak_local_max
can return lots and lots of peaks for small separation distances. The default setting in get_atom_position
and get_feature_separation
is that the atoms too close together are removed. However, the implementation in _remove_too_close_atoms
relied on nested loops with expensive calculations repeated over and over in each loop. This made both get_atom_position
and get_feature_separation
extremely time consuming if you happen to start with bad settings. For a position list of 8300 peaks, the original implementation of _remove_too_close_atoms
with a separation of 50 took longer than 2 min on my laptop.
I changed the implementation to rely in cKDTrees which are a nice fast data structure to deal with nearest neighbor problems in n-dimensional point clouds. The code is maybe a bit less intuitive but I added comments to clarify. I also added two keyword arguments: intensity
representing the image intensity at each peak, to decide which peak to remove preferentially in a pair and max_iter
to guard against a (I think impossible) infinite loop situation. Using this approach I could get the same position list to run in 200 ms. Point clouds of hundreds of thousands of points can still be run in a few seconds, whereas this is totally impossible in the original implementation.
Outputs are nearly identical:
This is what the original point cloud looks like:
Here are the cleaned up clouds:
Blue is the original implementation, orange is the new implementation. There are only two points where there is no coincidence I think because the order in which atoms are removed can have a slight influence. Still I think the new implementation is equally valid (tests still pass) and much faster, which makes get_feature_separation
much more usable.