If B
sparse, it can be effective (i.e., O (n), provided that the condition for B
good) will be solved for x_i
in
B x_i = a_i
(sample Gradient conjugator code is listed on Wikipedia). Taking a_i
as the vectors of column A
, you get the matrix B^{-1} A
in O (n ^ 2). Then you can sum the diagonal elements to get a trace. As a rule, it is easier to do this sparse inverse multiplication than to get a complete set of eigenvalues. For comparison, Cholesky's decomposition is O (n ^ 3). (see Darren Engwird's comment below about Cholesky).
If you only need a rough estimate of the trace, you can actually reduce the cost to O (qn) by averaging
r^T (AB^{-1}) r
over q
random vectors r
. Usually q << n
. This is an unbiased estimate provided that the components of the random vector r
satisfy
< r_i r_j > = \delta_{ij}
where < ... >
indicates the distribution average r
. For example, the components r_i
can be independent Gaussian, distributed with a single dispersion. Or they can be selected evenly with + -1. As a rule, the trace is scaled as O (n), and the error in the trace estimation is scaled as O (sqrt (n / q)), therefore the relative error is scaled as O (sqrt (1 / nq)).
source share