Why should I make a copy of the data frame in pandas

Question

Why should I make a copy of the data frame in pandas

When choosing a data subfile from the parent frame, I noticed that some programmers make a copy of the data frame using the .copy() method.

Why are they making a copy of the data frame? What happens if I do not make a copy?

+48

pandas chained-assignment

Elizabeth Susan Joseph Dec 28 '14 at 2:22

source share

3 answers

Because if you are not making a copy, indexes can still be manipulated elsewhere, even if you assign a dataFrame to a different name.

For example:

 df2 = df func1(df2) func2(df)

func1 can change df by changing df2, therefore, to avoid this:

 df2 = df.copy() func1(df2) func2(df)

+13

sparrow Sep 22 '16 at 1:27

source share

It should be noted that returning a copy or view depends on the type of indexing.

The pandas documentation says:

Return view to copy
The rules about when a data view is returned is entirely up to NumPy. Whenever an array of labels or a boolean vector is involved in the indexing operation, the result is a copy. With a one-time indexing and slicing the label / scalar, for example. df.ix [3: 6] or df.ix [:, 'A'], the view will be returned.

+2

Gusev Slava Jan 20 '17 at 13:22

source share

cgold · Accepted Answer · 2014-12-28 20:01

This extends Paul's answer. In Pandas, indexing a DataFrame returns a reference to the original DataFrame. Thus, changing the subset will change the original DataFrame. So you want to use a copy if you want to make sure that the original DataFrame should not be changed. Consider the following code:

 df = DataFrame({'x': [1,2]}) df_sub = df[0:1] df_sub.x = -1 print(df)

You'll get:

 x 0 -1 1 2

In contrast, the following df leaves do not change:

 df_sub_copy = df[0:1].copy() df_sub_copy.x = -1

Why should I make a copy of the data frame in pandas

More articles: