How to solve memory error in mtrand.RandomState.choice?

I try to select 1e7 elements from rows 1e5, but I get a memory error. This is a great selection of 1e6 items from lines 1e4. I am on a 64-bit machine with 4 GB of RAM and do not think that I should reach the memory limit in 1e7. Any ideas?

$ python3 Python 3.3.3 (default, Nov 27 2013, 17:12:35) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> K = 100 

Works great with 1e6:

 >>> N = int(1e6) >>> np.random.choice(["id%010d"%x for x in range(N//K)], N) array(['id0000005473', 'id0000005694', 'id0000004115', ..., 'id0000006958', 'id0000009972', 'id0000003009'], dtype='<U12') 

Error with N = 1e7:

 >>> N = int(1e7) >>> np.random.choice(["id%010d"%x for x in range(N//K)], N) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "mtrand.pyx", line 1092, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:8229) MemoryError >>> 

I found this question, but it looks like it will catch the error rather than this, and not solve it.

Python won't catch MemoryError

I would be pleased with either the solution using random.choice , or another method for this. Thank.

+2
python
Sep 02 '14 at 15:33
source share
1 answer

You can get around this with the generator function:

 def item(): for i in xrange(N): yield "id%010d"%np.random.choice(N//K,1) 

This avoids the need for all elements in memory at once.

+1
Sep 02 '14 at 16:44
source share



All Articles