Is there a way to check if NumPy arrays have the same data?

Question

Is there a way to check if NumPy arrays have the same data?

My impression is that in NumPy two arrays can use the same memory. Take the following example:

import numpy as np a=np.arange(27) b=a.reshape((3,3,3)) a[0]=5000 print (b[0,0,0]) #5000 #Some tests: a.data is b.data #False a.data == b.data #True c=np.arange(27) c[0]=5000 a.data == c.data #True ( Same data, not same memory storage ), False positive

So b did not make a copy of a ; he simply created some new metadata and bound it to the same memory buffer that a uses. Is there a way to check if the same memory buffer refers to two arrays?

My first impression was to use a.data is b.data , but that returns false. I can do a.data == b.data , which returns True, but I don’t think that checks to make sure that a and b use the same memory buffer, only the memory block that a refers to, and the one referenced by b have the same bytes.

+30

python numpy

mgilson Jul 02 2018-12-12T00:

source share

3 answers

You can use the base attribute to check if the array shares memory with another array:

 >>> import numpy as np >>> a = np.arange(27) >>> b = a.reshape((3,3,3)) >>> b.base is a True >>> a.base is b False

Not sure if this solves your problem. The base attribute will be None if the array owns its own memory. Note that the base of the array will be another array, even if it is a subset:

 >>> c = a[2:] >>> c.base is a True

+25

jterrace Jul 02 '12 at 1:30

source share

Just do:

 a = np.arange(27) a.__array_interface__['data']

The second line will return a tuple, where the first record is the memory address, and the second is just an array reading. In combination with the form and data type, you can determine the exact interval of the memory address that the array spans, so you can also solve this problem when one array is a subset of the other.

+4

Nir Friedman Mar 10 '15 at

source share

user545424 · Accepted Answer · 2012-07-02 03:16

I think jterrace's answer is probably the best way to go, but here is another possibility.

 def byte_offset(a): """Returns a 1-d array of the byte offset of every element in `a`. Note that these will not in general be in order.""" stride_offset = np.ix_(*map(range,a.shape)) element_offset = sum(i*s for i, s in zip(stride_offset,a.strides)) element_offset = np.asarray(element_offset).ravel() return np.concatenate([element_offset + x for x in range(a.itemsize)]) def share_memory(a, b): """Returns the number of shared bytes between arrays `a` and `b`.""" a_low, a_high = np.byte_bounds(a) b_low, b_high = np.byte_bounds(b) beg, end = max(a_low,b_low), min(a_high,b_high) if end - beg > 0: # memory overlaps amem = a_low + byte_offset(a) bmem = b_low + byte_offset(b) return np.intersect1d(amem,bmem).size else: return 0

Example:

 >>> a = np.arange(10) >>> b = a.reshape((5,2)) >>> c = a[::2] >>> d = a[1::2] >>> e = a[0:1] >>> f = a[0:1] >>> f = f.reshape(()) >>> share_memory(a,b) 80 >>> share_memory(a,c) 40 >>> share_memory(a,d) 40 >>> share_memory(c,d) 0 >>> share_memory(a,e) 8 >>> share_memory(a,f) 8

Here is a graph showing the time for each call to share_memory(a,a[::2]) as a function of the number of elements in a on my computer.

share_memory function

Is there a way to check if NumPy arrays have the same data?

More articles: