Efficient memory with centralized sparse SVD / PCA (in July)?

Question

Efficient memory with centralized sparse SVD / PCA (in July)?

I have 3 million x 9 million sparse matrices with several billion non-zero entries. R and Python do not allow sparse matrices with lots of non-zero MAXINT entries, so I found myself using Julia.

While scaling this data with standard deviation is trivial, humiliation is of course a non-move in a naive way, as it would create a dense matrix of 200 terabytes.

The corresponding code for svd - julia can be found at https://github.com/JuliaLang/julia/blob/343b7f56fcc84b20cd1a9566fd548130bb883505/base/linalg/arnoldi.jl#L398

From my reading, the key element of this code is the AtA_or_AAt structure and several functions around them, in particular A_mul_B !. Copied below for your convenience.

struct AtA_or_AAt{T,S} <: AbstractArray{T, 2}
    A::S
    buffer::Vector{T}
end

function AtA_or_AAt(A::AbstractMatrix{T}) where T
    Tnew = typeof(zero(T)/sqrt(one(T)))
    Anew = convert(AbstractMatrix{Tnew}, A)
    AtA_or_AAt{Tnew,typeof(Anew)}(Anew, Vector{Tnew}(max(size(A)...)))
end

function A_mul_B!(y::StridedVector{T}, A::AtA_or_AAt{T}, x::StridedVector{T}) where T
    if size(A.A, 1) >= size(A.A, 2)
        A_mul_B!(A.buffer, A.A, x)
        return Ac_mul_B!(y, A.A, A.buffer)
    else
        Ac_mul_B!(A.buffer, A.A, x)
        return A_mul_B!(y, A.A, A.buffer)
    end
end
size(A::AtA_or_AAt) = ntuple(i -> min(size(A.A)...), Val(2))
ishermitian(s::AtA_or_AAt) = true

This is passed to the eigs function, where some kind of magic happens, and then the output is processed into the appropriate components for SVD.

I think the best way to do this job for a center-on-the-fly setup is to do something like a subclass of AtA_or_AAT with a version of AtA_or_AAT_centered, which more or less mimics behavior, but also saves column values and overrides A_mul_B! properly.

. , , , SVD ( , , , - ).

edit: , Julia, "Centered Sparse Matrix" , , , , . , , . , , , .

+4

sparse-matrix pca julia-lang svd

James 25 . '17 6:50

1

James · Accepted Answer · 2017-10-04T22:49:36+0000

, :

Ac n x m A, m, n x 1, 1. m x k X

Ac := (A - 1M')
AcX = X
    = AX - 1M'X

. .

AX , M'X - , 1 "" ( ) AX . , .

, AcX Ac'X. , svds, .

Efficient memory with centralized sparse SVD / PCA (in July)?

More articles: