I have very large matrices (say, of the order of millions of rows) that I cannot store in memory, and I will need to access a subsample of this matrix during descent (less than a minute ...). I started looking at hdf5 and crying in combination with numpy and pandas:
But I found this a bit complicated, and I'm not sure if this is the best solution.
Are there other solutions?
thank
EDIT
Here are a few more specifications about the data types I'm dealing with.
- Matrices are usually sparse (<10% or <25% of cells with non-zero).
- The matrices are symmetric
And what I need to do: