To store a large matrix on disk, I use numpy.memmap.
Here is a sample code for testing large matrix multiplication:
import numpy as np import time rows= 10000
So my questions are:
- How to limit memory consumption by an application using this procedure to a certain value, for example, up to 100 MB (or 1 GB or something else). Also, I don’t understand how to estimate memory consumption in a procedure (I think that memory is allocated only when a “data variable” is created, but how much memory is used when using memmap files?)
- Maybe there is some optimal solution for multiplying large matrices stored on disk? For example, it is possible that data that is not optimally stored on disk or recorded from disk, not processed properly, and also a point product use only one core. Maybe I should use something like PyTables?
I am also interested in algorithms that solve a linear system of equations (SVD, etc.) with limited memory use. Maybe these algorithms are called “off-core” or “iterative”, and I think there is an analogy like ↔ ram, gpu ram ↔ cpu ram, cpu ram ↔ cpu cache.
Also here I found information about matrix propagation in PyTables.
I also found this in R, but I need it for Python or Matlab.
python numpy matrix bigdata pytables
mrgloom Oct 14 '13 at 11:18 2013-10-14 11:18
source share