[Performances] Reduce peak memory consumption
Some distinguishers perform operations like A += (B @ C)
with large matrices. It implies to have a memory peak consumption twice the normal size of the data stored in the distinguisher. Indee, before the inplace sum, we have both A
and (B @ C)
stored in memory.
To avoid that, we could use low level BLAS functions dgemm
ans sgemm
. These operations perform both the dot product and the addition in a single operation without high memory consumption peak.