This is not an answer, but I'm losing my reputation for comment, sorry. In response to a CedericDeBoom comment: numpy.memmap does not require the size to be known in advance, see also my answer here . Turning to the answer to the question " Using buffers with memory mapping for sparseness, I wrote the following test:
s = 2**30 data = np.memmap("sp_data.bin", dtype=np.float32, mode="w+", shape=(s,)) indices = np.memmap("sp_indices.bin", dtype=np.int32, mode="w+", shape=(s,)) indptr = np.memmap("sp_indptr.bin", dtype=np.int32, mode="w+", shape=(s+1,)) A = sp.csc_matrix((data, indices, indptr), shape=(s,s))
while memmap files have a total size> 12 GB, less than 1 GB of RAM was used.
Therefore, I think it will indeed be possible to build data, indexes and indptr gradually, as shown in my previous answer, and then build scipy.csc_matrix from them.
Hope this helps. Best regards, Lucas
source share