Why does `set_index` create an index label for the column name?

I have a CSV file that starts as follows:

Year,Boys,Girls
1996,333490,315995
1997,329577,313518
1998,325903,309998

When I read it in pandas and set the index, it does not do what I expect:

df = pd.read_csv('../data/myfile.csv')
df.set_index('Year', inplace=True)
df.head()

Why is there an index entry for the column label, with empty values ​​next to it? Shouldn't it fade away?

enter image description here

In addition, I do not understand how to get the values ​​for 1998. If I try df.loc['1998'], I get the error message: KeyError: 'the label [1998] is not in the [index]'.

+4
source share
1 answer

You must set the attribute name of your index None:

df.index.names = [None]
df.head()
#       Boys    Girls
#1996   333490  315995
#1997   329577  313518
#1998   325903  309998

As for getting data for 1998, just lose the quotes:

df.loc[1998]
#Boys     325903
#Girls    309998
#Name: 1998, dtype: int64
+3
source

Source: https://habr.com/ru/post/1657203/


All Articles