Further improvements in the reference BLAS routines.
DSYRK improved, which affects performance of R crossprod and tcrossprod.
DGEMM is improved for all cases except when the first but not second
operand is tranposed.
Also fixes the bug illustrate below:
a <- matrix(c(NA,1,0,1),2,2,byrow=T)