I do something like this to get:
bMRes += MatrixXd(n, n).setZero()
This gets me an incrementation of
bM * bM.transpose() but twice as fast.
bM are of type
To optimize things further, I would like to skip the copy (and incrementation) of the Lower part.
In other words, I would like to compute and write only the Upper part.
Again, in other words, I would like to have my result in the Upper part and 0's in the Lower part.
If it is not clear enough, feel free to ask questions.
Thanks in advance.
bMRes is self-adjoint originally, you could use the following code, which only updates the upper half of
If not, I think you have to accept that
.selfadjointView<>() will always copy the other half when assigned to a
.rankUpdate(A), the cost of copying half of
A can be ignored when
A is reasonably large. So I guess you don't need to optimize your code further.
If you just want to evaluate the difference, you could use low-level BLAS APIs.
A*A.transpose() is equivalent to
.rankUpdate(A) is equivalent to
syrk() don't copy the other half automatically.