Skip to content

cKDTree based NN removal for multiple order magnitude speed gain

Niels Cautaerts requested to merge din14970/atomap:speed_atom_removal into master

For large noisy images, peak_local_max can return lots and lots of peaks for small separation distances. The default setting in get_atom_position and get_feature_separation is that the atoms too close together are removed. However, the implementation in _remove_too_close_atoms relied on nested loops with expensive calculations repeated over and over in each loop. This made both get_atom_position and get_feature_separation extremely time consuming if you happen to start with bad settings. For a position list of 8300 peaks, the original implementation of _remove_too_close_atoms with a separation of 50 took longer than 2 min on my laptop.

I changed the implementation to rely in cKDTrees which are a nice fast data structure to deal with nearest neighbor problems in n-dimensional point clouds. The code is maybe a bit less intuitive but I added comments to clarify. I also added two keyword arguments: intensity representing the image intensity at each peak, to decide which peak to remove preferentially in a pair and max_iter to guard against a (I think impossible) infinite loop situation. Using this approach I could get the same position list to run in 200 ms. Point clouds of hundreds of thousands of points can still be run in a few seconds, whereas this is totally impossible in the original implementation.

Outputs are nearly identical:

This is what the original point cloud looks like: image

Here are the cleaned up clouds: image

Blue is the original implementation, orange is the new implementation. There are only two points where there is no coincidence I think because the order in which atoms are removed can have a slight influence. Still I think the new implementation is equally valid (tests still pass) and much faster, which makes get_feature_separation much more usable.

Merge request reports