Numpy: Logical Indexing and Memory Usage

Question

Numpy: Logical Indexing and Memory Usage

Consider the following numpy code:

 A[start:end] = B[mask]

Here:

A and B are two-dimensional arrays with the same number of columns;
start and end are scalars;
mask is a 1D boolean array;
(end - start) == sum(mask) .

In principle, the above operation can be performed using the temporary storage O(1) by copying the elements of B directly to A

Is this what actually happens in practice, or is numpy building a temporary array for B[mask] ? If last, is there a way to avoid this by rewriting the expression?

+6

python memory-management numpy large-data

NPE May 11 '11 at 9:19

source share

2 answers

Line

 A[start:end] = B[mask]

will - according to the definition of the Python language - first evaluate the right side, get a new array containing the selected row B and take up additional memory. The most efficient pure-Python method I know of is to use an explicit loop:

 from itertools import izip, compress for i, b in izip(range(start, end), compress(B, mask)): A[i] = b

Of course, it will be much less time than your source code, but it uses only O (1) additional memory. Also note that itertools.compress() is available in Python 2.7 or 3.1 or higher.

+3

Sven marnach May 11 '11 at 9:53

source share

tillsten · Accepted Answer · 2011-05-11T09:52:28+0000

Using boolean arrays as an index is fantastic indexing, so numpy needs to make a copy. You can write a cython extension to handle this if you are having memory issues.

Numpy: Logical Indexing and Memory Usage

More articles: