Why is a copy created when None is assigned?

In[216]: foo = pd.DataFrame({'a':[1,2,3], 'b':[3,4,5]}) In[217]: bar = foo.ix[:1] In[218]: bar Out[218]: ab 0 1 3 1 2 4 

The view is created as expected.

 In[219]: bar['a'] = 100 In[220]: bar Out[220]: ab 0 100 3 1 100 4 In[221]: foo Out[221]: ab 0 100 3 1 100 4 2 3 5 

If the view is changed, that is, the original data frame is foo. However, if the assignment is done using None, then a copy is created. Can anyone shed light on what is happening and maybe on logic?

 In[222]: bar['a'] = None In[223]: bar Out[223]: ab 0 None 3 1 None 4 In[224]: foo Out[224]: ab 0 100 3 1 100 4 2 3 5 
+6
source share
2 answers

When you assign bar['a'] = None , you force the column to change its dtype from, for example, I4 to object .

This leads to the fact that a new array of object is allocated for the column, and, of course, it writes this new array instead of the old array, which is shared with the original DataFrame .

+7
source

You are executing a tethered assignment form, see here why this is a really bad idea.

See this question, but here

Pandas usually warns you that you are changing the view (especially at 0.15.0).

 In [49]: foo = pd.DataFrame({'a':[1,2,3], 'b':[3,4,5]}) In [51]: foo Out[51]: ab 0 1 3 1 2 4 2 3 5 In [52]: bar = foo.ix[:1] In [53]: bar Out[53]: ab 0 1 3 1 2 4 In [54]: bar.dtypes Out[54]: a int64 b int64 dtype: object # this is an internal method (but is for illustration) In [56]: bar._is_view Out[56]: True # this will warn in 0.15.0 In [57]: bar['a'] = 100 /usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy #!/usr/local/bin/python In [58]: bar._is_view Out[58]: True # bar is now a copied object (and will replace the existing dtypes with new ones). In [59]: bar['a'] = None In [60]: bar.dtypes Out[60]: a object b int64 dtype: object 

You should never rely on whether something is a representation (even in numpy), except in some very strong situations. This is not a guaranteed design, depending on the location of the underlying data memory.

You very very rarely try to set data for distribution through a view. and doing this in pandas almost always cause problems when you mix dtypes types. (In numpy you can see only one type of dtype, I'm not even sure that the idea of ​​a multi-thread array that modifies dtype does or even allows it).

+6
source

Source: https://habr.com/ru/post/974846/


All Articles