Identifying speed improvements to Gaussian2D refinement
Currently the Gaussian2D curve fitting takes too long (hours) on a large (2048) image. Using my test image from previous PRs as an example, it has 17000 atoms. If we want to finish fitting such an image in less than 5 minutes (a reasonable amount of time for impatient researchers), we would need each fitting operation to take no more than 18 ms. Currently it takes 250 ms. This may be a tough goal, but any improvement would be good.
I have already identified a few places that could improve things. The relevant code-section is around line 740 on atom_finding_refinement. I have profiled the _make_model_from_atom_list
function, which gave a few interesting revelations. The following are some points that we can fix:
- Remove unused mask creation call
- The fitting algorithm currently allows feeding it a list of atoms, rather than just one. Since a natural manner of cropping the dataset would be quite different in a case of only one atom (detailed below), I suggest creating a conditional which would fit in the previous manner when there are more atoms present, and in a new manner when there is only one atom present. As a precursor to fitting, several things currently happen:
- Turn data into floats (even if it is already)
- A mask with the shape of s.data is created, set to True in places that are within a radius of the atoms in the atom_list.
- The coordinates of the edges are calculated from where the the mask is True.
- Mask is cropped on the basis of these indices.
- Data is multiplied by the mask, then cropped by the same indices
- 1 should be moved outside the loop so that the data already is floats by the time it arrives. Barring that, it should at least check that the dtype isn't already float before calling the method. It takes over 30 ms to "convert" such a float64 image to float64. Actually, is turning it into floats even necessary?
- I suggest replacing 2-5 with a cropped array centered around the atom. Cropping (even with the padding that I introduced in !47 (merged) and am currently rewriting a lot faster) should take nano-microseconds rather than the 120ms it is currently taking.
- Currently, a circular mask is centered around each atom. I will investigate to see if we can keep doing this cheaply, or whether we should discuss not doing this.
-
@magnunor How much consideration did you give the initial values for
gaussian.A.value
? Currently it isupper_value*10
. Have you checked if this is a reasonable value?
Edited by Thomas Aarholt