Most expensive computations in Deformetrica are kernel convolutions. To perform these computations, we offer two alternatives:
Using torch vectorized computations. This can be done on both CPU and GPU. This implementation is extremely fast but also extremely memory greedy: it will cause a memory overflow for convolutions with several thousands of points.
Using keops. This can be done on both CPU and GPU. This implementation is not as fast as torch on CPU, but about as fast as torch on GPU. Its memory consumption is much lower (linear in the number of points, whereas torch convolutions are quadratic in the number of points) and it can be used on meshes or images with millions of points. Note that we do not support pykeops kernel on Mac OS for now.
Extensive optimization of the use of those kernels depending on the task and the hardware is on our roadmap for future releases. In the meantime, here is the behaviour of Deformetrica in different cases:
if use-cuda is set to On, all torch computations will be done on GPU.
if a keops kernel is used, it will automatically attempt to use a GPU backend. If such a backend is not available, it will use the CPU backend.
if the number-of-threads parameter is larger than one and a keops kernel is used with GPU backend, then the number-of-threads parameter will be overriden to 1 and all computations will be carried out on GPU.
Don't hesitate to reach us with questions/suggestions about these kernel operations and the multi-threading of the model computations.