If B sparse, it can be effective (i.e., O (n), provided that the condition for B good) will be solved for x_i in
B x_i = a_i
(sample Gradient conjugator code is listed on Wikipedia). Taking a_i as the vectors of column A , you get the matrix B^{-1} A in O (n ^ 2). Then you can sum the diagonal elements to get a trace. As a rule, it is easier to do this sparse inverse multiplication than to get a complete set of eigenvalues. For comparison, Cholesky's decomposition is O (n ^ 3). (see Darren Engwird's comment below about Cholesky).
If you only need a rough estimate of the trace, you can actually reduce the cost to O (qn) by averaging
r^T (AB^{-1}) r
over q random vectors r . Usually q << n . This is an unbiased estimate provided that the components of the random vector r satisfy
< r_i r_j > = \delta_{ij}
where < ... > indicates the distribution average r . For example, the components r_i can be independent Gaussian, distributed with a single dispersion. Or they can be selected evenly with + -1. As a rule, the trace is scaled as O (n), and the error in the trace estimation is scaled as O (sqrt (n / q)), therefore the relative error is scaled as O (sqrt (1 / nq)).
source share