Slow copy to triangularView

User "peddie" on Eigen discord reports that copy to triangularView is slow and asks

"the documentation for intel MKL / VML integration shows an example snippet of vectorised trig functions and then says "In the examples, [the operands in the example] are dense vectors." does "vectors" here imply that calling element-wise trig on ArrayXd will be vectorised if VML is available, but ArrayXXd will not? related question, if I do an assignment like A.triangularViewEigen::Lower() = B.array().square(), what does the fact that it's evaluating to a triangle imply about how the square() is computed? will it still be done via SIMD instructions?"

We should fix the assignment operator to use SIMD if possible.