Numpy: Logical Indexing and Memory Usage

Consider the following numpy code:

 A[start:end] = B[mask] 

Here:

  • A and B are two-dimensional arrays with the same number of columns;
  • start and end are scalars;
  • mask is a 1D boolean array;
  • (end - start) == sum(mask) .

In principle, the above operation can be performed using the temporary storage O(1) by copying the elements of B directly to A

Is this what actually happens in practice, or is numpy building a temporary array for B[mask] ? If last, is there a way to avoid this by rewriting the expression?

+6
source share
2 answers

Using boolean arrays as an index is fantastic indexing, so numpy needs to make a copy. You can write a cython extension to handle this if you are having memory issues.

+2
source

Line

 A[start:end] = B[mask] 

will - according to the definition of the Python language - first evaluate the right side, get a new array containing the selected row B and take up additional memory. The most efficient pure-Python method I know of is to use an explicit loop:

 from itertools import izip, compress for i, b in izip(range(start, end), compress(B, mask)): A[i] = b 

Of course, it will be much less time than your source code, but it uses only O (1) additional memory. Also note that itertools.compress() is available in Python 2.7 or 3.1 or higher.

+3
source

Source: https://habr.com/ru/post/887805/


All Articles