[JOSS review] performance claim on deconvolution

A key claim in the JOSS submission is that STARRED is faster than previous codes. This should be demonstrated against a code which implements a similar technique (2-channel deconvolution) or one with comparable metrics in its final output.

On this point, I am somewhat concerned with the performance of the deconvolution algorithm as presented in the jupyter notebooks. Especially notebook "3_Another lensed quasar - joint deconvolution" runs into the poor convergence performance of gradient descent. It is clear in the loss history from cell "In [11]:" that even after 25,000 iterations the algorithm has not converged. Algorithms like "Adabelief" perform well in stochastic optimization where the full loss function cannot be accessed at once and so their slow convergence actually avoids overfitting individual batches. However, in the optimization problem faced by STARRED, the full loss can be accessed at once as it is a chi^2 plus regularization problem of a manageable number of pixels. In this scenario the Levenberg-Marquardt (LM) algorithm is the typical method of choice (for a nice introduction to the algorithm see this link). For example GALFIT (Peng et al. 2010), which solves a similar image fitting problem (though not deconvolution), uses LM as the core solver. LM is a second order method meaning that its convergence gets faster near the minimum as opposed to gradient descent which gets slower. The difference in performance can be orders of magnitude to achieve a typical relative tolerance threshold on the chi^2 (1e-3 is quite common).

There are LM implementations in JAX that may be worth examining, e.g. here: https://jaxopt.github.io/stable/_autosummary/jaxopt.LevenbergMarquardt.html though I have not tested this one myself. LM is a very memory intensive algorithm, it may be necessary to produce a custom implementation.