Re-think handling of derivatives in the code
For M06-2X functional, on Tesla P100,
|LibXC||points, millions per second|
|without any derivatives||84.5|
|with 1st and 2nd||62.2|
This seems to suggest that the order of the compiled-in derivatives has a significant effect on the speed of libxc. Enabling compilation of the derivatives increases the number of variables in the code, requiring a larger stack size, even though the derivatives don't get evaluated for an energy-only evaluation.
These results seem to suggest that we should change the way derivative support is handled in libxc. Maybe a better solution would be to compile several variants of each routine, i.e.
- energy-only evaluation
- energy and first derivatives
- energy and first and second derivatives
- energy and first, second, and third derivatives
- energy and first, second, third, and fourth derivatives
This way we could also eliminate statements like
if(order < 1) return;from the kernel code.
Since the size of the code increases rapidly in the level of derivatives, it should be perfectly feasible to compile and link in the lower-level derivative routines without issue; this is also the approach used in e.g. NWChem: everything is compiled several times with different derivative flags. Moreover, the order could be chosen at top level, instead of within the loop over grid points.