What is the best way to multiply a large and sparse matrix with its transposition?

Currently, I want to multiply a large sparse matrix (~ 1M x 200k) with its transposition. The values ​​of the resulting matrix will be in float.

  • I tried loading the matrix into a lean sparse matrix and multiplying each row of the first matrix by the second matrix. Multiplication took ~ 2 hours.

What is an effective way to achieve this multiplication? Because I see the pattern in the calculation.

  • The matrix is ​​large and sparse.
  • Multiplication of a matrix with its transposition. Thus, the resulting matrix will be symmetric.

I would like to know which libraries can calculate faster. It can be in Python, R, C, C ++ or any other.

+4
source share
2 answers

I believe that your main need is to save memory. First, when you multiply a matrix with its transpose, you do not need any memoory to transpose: all its cells are directly accessible through the first matrix (tA [i, j] = A [j, i]). Almost 1/3 of the memory is saved.

I see that the computation time cannot be neglected either. Since the resulting matrix will be symmetric, you can only calculate one half and directly save the other. About half the calculation time is saved.

, , , , scipy , COO: .

... , , (, python, scipy).

Python (matrix = A [M] [N])

I = []
J = []
V = []
for i in range(M):
    for j in range(i:M) :
        X = 0.0
        for k in range(N):
            X += A[i ][k] * A[k][j]
        if X != 0.0 # or abs (X) > epsilon if floating point accuracy is a concern ... 
            I.append (i )
            J.append(j)
            V.append(X)
            I.append (j )
            J.append(i)
            V.append(X)

I, J, V - , COO :

RESULT = sparse.coo_matrix((V,(I,J)),shape=(N, N))
+4
def URMdist(URM):
    NLin=URM.shape[0]
    NCol=URM.shape[1]
    URMt=URM.T
    Result = lil_matrix((NLin,NLin))
    for Lin in range(0,NLin):
        X = 0.0
        for Col in range(Lin,NLin):
            X = URM[Col,:].dot(URMt[:,Lin])
            if X != 0.0: 
                Result[Lin,Col] = Result[Col,Lin] = X
    return Result
0

Source: https://habr.com/ru/post/1547106/


All Articles