Add LARFG implementation and LAPACK bones
As a start towards #6, I began to implement/adapt code from clMAGMA. Eigenvalue decompositions (symmetric ones at least) are done via SYEVD, which in turn depends on LATRD, which in turn depends on LARFG.
So, this is a quick implementation of LARFG, plus all the auxiliary code necessary to make implementing further LAPACK functionality easy.
I'm not sure how this implementation will perform, but it at least works. I think the time for performance tuning will be once the final SYEVD implementation is done, and then we can compare the top-level eigenvalue decomposition vs. other toolkits, and if LARFG is slow, it will show up as a bottleneck.
This also fixes some minor bugs I discovered with the dot()
implementation.
LARFG has support to handle when the given vector has values that are too small, but, I commented this out, as dot()
does not, so it isn't possible to actually activate the scaling condition I implemented. :) So, that could be uncommented once dot()
is more robust, perhaps.