Operations on huge dense matrices in numpy

Question

Operations on huge dense matrices in numpy

To train a neural network at some point, I have a huge matrix of 212,243 × 2,500 phi in size and vectors y (212243) and w (2500), which are stored as numpy arrays of twins. What I'm trying to figure out

 w = dot(pinv(phi), y) # serialize w... r = dot(w, transpose(phi)) # serialize r...

My machine has 6 GB of RAM and 16 GB of swap on Ubuntu x64. I started calculating two and two times when I ended up with system swap errors (not python) after about an hour of work.

Is there a way to do this calculation on my computer? This does not need to be done with python.

+4

python numpy linear-algebra

Bogdan kulynych May 09 '13 at 20:01

source share

2 answers

Jaime · Answer 1 · 2013-05-09T22:43:05+0000

If you don't need a pseudo-inverse for anything else but to calculate w , replace this line as follows:

 w = np.linalg.lstsq(phi, y)[0]

On my system, it runs about 2 times faster and uses about half of the intermediate storage.

duffymo · Answer 2 · 2013-05-09T20:09:12+0000

We will see:

 212,243 row values * 2500 col values * 8 bytes/value = 4,244,860,000 bytes = 4GB

To save all memory in memory, you will need memory.

If it were Java, I would recommend that you increase the maximum heap on your JVM. I don't know what the analogy is for Python.

Operations on huge dense matrices in numpy

More articles: