Pandas views vs copy: docs say no one knows?

There are many questions about StackOverflow about the index chain and whether a particular operation makes a view or a copy. (e.g. here or here ). I still do not fully understand this, but the surprising part is the official documents that say "no one knows." (!?! ??) Here is an example from the docs; can you tell me if they really meant it, or if they are just frivolous?

From http://pandas-docs.imtqy.com/pandas-docs-travis/indexing.html?highlight=view#why-does-assignment-fail-when-using-chained-indexing

def do_something(df):
   foo = df[['bar', 'baz']]  # Is foo a view? A copy? Nobody knows!
   # ... many lines here ...
   foo['quux'] = value       # We don't know whether this will modify df or not!
   return foo

Seriously? For this particular example, is it true that “no one knows” and is it not deterministic? Will it behave differently on two different data frames? Are these rules really complicated? Or the guy said that there is a definite answer, but simply that most people do not know about it?

+4
source share
3 answers

I think I can demonstrate something to clarify your situation, in your example, this will initially be a representation, but as soon as you try to change it by adding a column, it will turn into a copy. You can verify this by looking at the attribute ._is_view:

In [29]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
def doSomething(df):
    a = df[['b','c']]
    print('before ', a._is_view)
    a['d'] = 0
    print('after ', a._is_view)

doSomething(df)
df

before  True
after  False
Out[29]:
          a         b         c
0  0.108790  0.580745  1.820328
1  1.066503 -0.238707 -0.655881
2 -1.320731  2.038194 -0.894984
3 -0.962753 -3.961181  0.109476
4 -1.887774  0.909539  1.318677

, , a df, , , , df .

+5

, , , , , :

, ( , pandas )

, numpy, . pandas , - . , , - , , .

+4

, , , .

DataFrame, . , , , .

df = pd.DataFrame(np.random.randn(100, 100))
x = df[(df > 2).any(axis=1)]
print x._is_view
>>> True

# Prove that below we are referring to the exact same slice of the dataframe
assert (x.iloc[:len(x), 1] == x.iloc[:, 1]).all()

# Assign using equivalent notation to below
x.iloc[:len(x), 1] = 1
print x._is_view
>>> True

# Assign using slightly different syntax
x.iloc[:, 1] = 1
print x._is_view
>>> False
0
source

Source: https://habr.com/ru/post/1652190/


All Articles