I ran the following Python code that creates a Pandas DataFrame with two rows ( a
and b
) and then tries to create two new series ( c
and d
):
import pandas as pd df = pd.DataFrame({'a':[1, 2, 3], 'b':[4, 5, 6]}) df['c'] = df.a + df.b df.d = df.a + df.b
I understand that if the Pandas series is part of a DataFrame and the series name does not have spaces (and does not collide with an existing attribute or method), the Series can be accessed as a DataFrame attribute. So I expected line 3 to work (since you are creating a new Pandas series), and I expected line 4 to fail (since d
does not exist for the DataFrame until you execute this line of code).
To my surprise, line 4 did not lead to an error. Instead, the DataFrame now contains three series:
>>> df abc 0 1 4 5 1 2 5 7 2 3 6 9
And there is a new df.d
object, which is a series of Pandas:
>>> df.d 0 5 1 7 2 9 dtype: int64 >>> type(df.d) pandas.core.series.Series
My questions are as follows:
- Why didn't line 4 lead to an error?
- Is
df.d
now a “normal” Pandas series with all the regular functionality of the Series? - Is
df.d
connected to the df
DataFrame in any way, or is it a completely independent object?
The motivation for asking this question is that I want to understand Pandas better, and not because there is a specific use case for line 4.
My Python version is 2.7.11 and my Pandas version is 0.17.1.
source share