Speed up GPU version by reducing allocations
Avoid allocating and deallocating temporary batches in exponential and projection buffers in the hamiltonian. For the GPU version (using AETRS fot TD mode), this leads to large speed-up up to 5 and even more when using more than one GPU. The temporary buffers are reinitialized if their requested size changes.
Speed up GPU version by reducing allocations.
- I have checked that my code follows the Octopus coding standards
- I have added tests for all the new features added in this request.