Create a Pandas series with a period in the title

I ran the following Python code that creates a Pandas DataFrame with two rows ( a and b ) and then tries to create two new series ( c and d ):

 import pandas as pd df = pd.DataFrame({'a':[1, 2, 3], 'b':[4, 5, 6]}) df['c'] = df.a + df.b df.d = df.a + df.b 

I understand that if the Pandas series is part of a DataFrame and the series name does not have spaces (and does not collide with an existing attribute or method), the Series can be accessed as a DataFrame attribute. So I expected line 3 to work (since you are creating a new Pandas series), and I expected line 4 to fail (since d does not exist for the DataFrame until you execute this line of code).

To my surprise, line 4 did not lead to an error. Instead, the DataFrame now contains three series:

 >>> df abc 0 1 4 5 1 2 5 7 2 3 6 9 

And there is a new df.d object, which is a series of Pandas:

 >>> df.d 0 5 1 7 2 9 dtype: int64 >>> type(df.d) pandas.core.series.Series 

My questions are as follows:

  • Why didn't line 4 lead to an error?
  • Is df.d now a “normal” Pandas series with all the regular functionality of the Series?
  • Is df.d connected to the df DataFrame in any way, or is it a completely independent object?

The motivation for asking this question is that I want to understand Pandas better, and not because there is a specific use case for line 4.

My Python version is 2.7.11 and my Pandas version is 0.17.1.

+5
source share
2 answers

When making an assignment, you need to use parenthesis notation, for example. df['d'] = ...

d now a property of the dataframe df . As with any object, you can assign properties to them. That is why it did not cause an error. He simply did not behave as you expected ...

 df.some_property = 'What?' >>> df.some_property 'What?' 

This is a common area of ​​confusion for beginners before Pandas. Always use the bracket designations for the assignment. Point notation is intended for convenience when accessing a data / series frame. To be safe, you can always use the notation in parentheses.

And yes, df.d in your example is a normal series, which is now an unexpected property of the dataframe. This series is its own object, linked by the link that you created when you assigned it to df .

+6
source

@Alexander the answer is good. But just to clarify, this is not the specificity of pandas, but rather the specificity of python, see here for the relevant question:

Why add attributes to an already created object allowed in Python?

As for your last question, the series is unrelated (depending on what you mean by connectivity, though). But imagine:

 df = pd.DataFrame({'a':[1, 2, 3], 'b':[4, 5, 6]}) df.d = df.a + df.b df.sort("a", ascending=False, inplace=True) df ab 2 3 6 1 2 5 0 1 4 df.d 0 5 1 7 2 9 dtype: int64 

So df.d not sorted, but df.a and df.b have.

+1
source

Source: https://habr.com/ru/post/1244593/


All Articles