Why should I make a copy of the data frame in pandas

When choosing a data subfile from the parent frame, I noticed that some programmers make a copy of the data frame using the .copy() method.

Why are they making a copy of the data frame? What happens if I do not make a copy?

+48
pandas chained-assignment
Dec 28 '14 at 2:22
source share
3 answers

This extends Paul's answer. In Pandas, indexing a DataFrame returns a reference to the original DataFrame. Thus, changing the subset will change the original DataFrame. So you want to use a copy if you want to make sure that the original DataFrame should not be changed. Consider the following code:

 df = DataFrame({'x': [1,2]}) df_sub = df[0:1] df_sub.x = -1 print(df) 

You'll get:

 x 0 -1 1 2 

In contrast, the following df leaves do not change:

 df_sub_copy = df[0:1].copy() df_sub_copy.x = -1 
+55
Dec 28 '14 at 8:01
source share

Because if you are not making a copy, indexes can still be manipulated elsewhere, even if you assign a dataFrame to a different name.

For example:

 df2 = df func1(df2) func2(df) 

func1 can change df by changing df2, therefore, to avoid this:

 df2 = df.copy() func1(df2) func2(df) 
+13
Sep 22 '16 at 1:27
source share

It should be noted that returning a copy or view depends on the type of indexing.

The pandas documentation says:

Return view to copy

The rules about when a data view is returned is entirely up to NumPy. Whenever an array of labels or a boolean vector is involved in the indexing operation, the result is a copy. With a one-time indexing and slicing the label / scalar, for example. df.ix [3: 6] or df.ix [:, 'A'], the view will be returned.

+2
Jan 20 '17 at 13:22
source share



All Articles