Further performance improvements
Even after a reasonable number of performance improvements made in the speedup
branch there are still some left to do for plane-wave like models and general performance:
-
Direct double pointer access to the parameters, saving on element.pyx:522(set_double_value) and matrixfill.pyx:242(update_parameter_values). -
Caching the RHS when things like the laser isn't changing, that way you can just not bother calling any of that stack of functions repeatedly for no good reason. Probably tag nodes changing in RHS and just zero everything then memcpy those changing nodes. -
Validators: These either need to be Cython functions and eval'd super quick. We should probably check bounds at the start when doing standard axis sweeps. Like a dry-run, then flag any potential issues before even starting the simulation. Getting rid of validators during a run means we can have direct pointer access to those too. -
Symbols: These should be Cythonised, maybe even look in to Cython JIT/inline repeated calls. - https://github.com/codeplea/tinyexpr seems like a nice math evaluator. For simple symbols, which the majority are, this could be a fast way to avoid lambda calls in current Parameter lambdification.
-
Actions: Cythonise these so they can do quick C calls, probably will be an issue for things like pre/post steps. -
Node disabling: With the new workspace-connections interface we can just null particular connections easily now and not handle them. For example all the signal/mech nodes at mirrors which are not having any injections or connections. -
Model building and initialisation I haven't even looked into yet. However that's probably low priority to speed up now we can perform multiple actions per build -
KLU speed up: There might be some wins in klu.solve` by altering it to accept a sparse RHS input vector. Our RHS inputs are always pretty empty. If KLU isn't doing anything like this already then we'd obviously need a custom library. -
DetectorWorkspace
should hold a directvoid*
, stride, and index to store data directly. Or we passget_output
avoid*
to save the data in rather than constantly going back and forth from C to python datatypes -
Workspaces will eventually be converted entirely to C-structs. The idea would be to release the GIL and allow for parallel filling, output calculations, and generally minimal python shenanigans.
Edited by finesse importer