How to find all variables with the same identifier?

Question

How to find all variables with the same identifier?

Let's say I have a numpy a array and create b like this:

 a = np.arange(3) b = a

If you now change b , for example. like this

 b[0] = 100

and type a , b , their id and .flags

 print a print a.flags print b print b.flags print id(a) print id(b)

I get

 [100 1 2] C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False [100 1 2] C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False 139767698376944 139767698376944

So a and b look the same, and their id identical as expected.

When I now do the same with copy()

 c = np.arange(3) d = c.copy() d[0] = 20 print c print c.flags print id(c) print d print d.flags print id(d)

I get

 [0 1 2] C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False 139767698377344 [20 1 2] C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False 139767698376864

In this case, c and d are different from each other, therefore their id s; as expected.

However, I am confused by the output obtained from .flags : in all cases, OWNDATA set to True . When I read the documentation , I find:

OWNDATA (O) An array owns the memory it uses or occupies another object.

My main question is:

What will be the easiest way to find all variables pointing to the same id (in the example above a and b ), i.e. check if another variable with the same identifier exists? I thought OWNDATA would help, but apparently this is not the case.

Related questions:

What is OWNDATA used OWNDATA , in which case OWNDATA set to False ?

0

python arrays numpy copy

Cleb Nov 01 '15 at

source share

2 answers

Assigning b=a does not create a view for the original array a , but simply creates a reference to it. In other words, b is just another name for a . Both variables a and b belong to the same array that owns its data, so the OWNDATA flag is OWNDATA . Modification b will change a .

Purpose b=a.copy() creates a copy of the original array. That is, a and b refer to individual arrays that own their data so that the OWNDATA flag is OWNDATA . Changing b will not change a .

However, if you perform the assignment b=a[:] , you will create a view of the original array and b will not own its data. Modification b will change a .

The shares_memory function is what you are looking for. It does what it says in the field: check if arrays a and b shared memory and thus affect each other.

+4

Till Hoffmann Nov 01 '15 at 21:46

source share

hpaulj · Accepted Answer · 2015-11-02 03:53

There are 2 problems - how you define the variables you want to compare, and how to compare them.

Take the second first.

My version (1.8.2) does not have the np.shares_memory function. It has np.may_share_memory .

https://github.com/numpy/numpy/pull/6166 is a transfer request that adds shares_memory ; it dates from August. Therefore, to use it, you will need a new numpy . Please note that the final test is potentially difficult, and it may display the “TOO HARD” error message. I assume, for example, that there are some slices that share memory, but are difficult to identify by simply comparing the starting points of the buffer.

https://github.com/numpy/numpy/blob/97c35365beda55c6dead8c50df785eb857f843f0/numpy/core/tests/test_mem_overlap.py is a unit test for these memory_overlap functions. Read it if you want to see what a difficult task it is to think about all the possible conditions for overlapping between two known arrays.

I like to watch the .__array_interface__ . One element of this dictionary is “data”, which is a pointer to a data buffer. An identical pointer means data sharing. But the performance may begin somewhere down the line. I would not be surprised if shares_memeory looks at this pointer.

An identical id means that 2 variables refer to the same object, but different objects in the array can share a data buffer.

All of these tests require a search for specific links; so you still need to get some kind of list of links. See locals() ?, globals() . What about unnamed references, such as a list of arrays or some kind of custom dictionary?

Ipython run example:

Some variables and links:

 In [1]: a=np.arange(10) In [2]: b=a # reference In [3]: c=a[:] # view In [4]: d=a.copy() # copy In [5]: e=a[2:] # another view In [6]: ll=[a, a[:], a[3:], a[[1,2,3]]] # list

Compare id :

 In [7]: id(a) Out[7]: 142453472 In [9]: id(b) Out[9]: 142453472

None of the others share id except ll[0] .

 In [10]: np.may_share_memory(a,b) Out[10]: True In [11]: np.may_share_memory(a,c) Out[11]: True In [12]: np.may_share_memory(a,d) Out[12]: False In [13]: np.may_share_memory(a,e) Out[13]: True In [14]: np.may_share_memory(a,ll[3]) Out[14]: False

Whatever I expect; views exchange memory, no copies.

 In [15]: a.__array_interface__ Out[15]: {'version': 3, 'data': (143173312, False), 'typestr': '<i4', 'descr': [('', '<i4')], 'shape': (10,), 'strides': None} In [16]: a.__array_interface__['data'] Out[16]: (143173312, False) In [17]: b.__array_interface__['data'] Out[17]: (143173312, False) In [18]: c.__array_interface__['data'] Out[18]: (143173312, False) In [19]: d.__array_interface__['data'] Out[19]: (151258096, False) # copy - diff buffer In [20]: e.__array_interface__['data'] Out[20]: (143173320, False) # differs by 8 bytes In [21]: ll[1].__array_interface__['data'] Out[21]: (143173312, False) # same point

With this short session alone, I have 76 elements in locals() . But I can find it to match id with:

 In [26]: [(k,v) for k,v in locals().items() if id(v)==id(a)] Out[26]: [('a', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]

The same goes for other tests.

I can search ll in the same way:

 In [28]: [n for n,l in enumerate(ll) if id(l)==id(a)] Out[28]: [0]

And I could add a layer to the locals() search, checking if the item is a list or a dictionary, and do a search inside that.

So, even if we focus on the testing method, it is easy to find all possible links.

I think the best approach is to simply understand your own use of variables so that you can clearly identify links, opinions, and copies. In selected cases, you can run tests like may_share_memory or compare data. But there is no inexpensive, final test. When in doubt, it’s cheaper to make a copy than to risk writing something. In my years of using numpy I never felt the need for a definitive answer to this question.

I do not find the OWNDATA flag very useful. Consider the above variables

 In [35]: a.flags['OWNDATA'] Out[35]: True In [36]: b.flags['OWNDATA'] # ref Out[36]: True In [37]: c.flags['OWNDATA'] # view Out[37]: False In [38]: d.flags['OWNDATA'] # copy Out[38]: True In [39]: e.flags['OWNDATA'] # view Out[39]: False

While I can predict the value of OWNDATA in these simple cases, its value does not mean shared memory or a common identifier. False assumes that it was created from another array and thus can share memory. But this is just a "can."

I often create an array of patterns by changing the range.

 In [40]: np.arange(3).flags['OWNDATA'] Out[40]: True In [41]: np.arange(4).reshape(2,2).flags['OWNDATA'] Out[41]: False

There is no clear reference to the data, but the modified array does not "own" its own data. The same thing will happen with

 temp = np.arange(4); temp = temp.reshape(2,2)

I needed to do

 temp = np.arange(4); temp.shape = (2,2)

keep OWNDATA true. False OWNDATA means something immediately after creating a new array object, but it does not change if the original link is redefined or deleted. It is easily obsolete.

How to find all variables with the same identifier?

More articles: