Why does `set_index` create an index label for the column name?

Question

Why does `set_index` create an index label for the column name?

I have a CSV file that starts as follows:

Year,Boys,Girls
1996,333490,315995
1997,329577,313518
1998,325903,309998

When I read it in pandas and set the index, it does not do what I expect:

df = pd.read_csv('../data/myfile.csv')
df.set_index('Year', inplace=True)
df.head()

Why is there an index entry for the column label, with empty values next to it? Shouldn't it fade away?

In addition, I do not understand how to get the values for 1998. If I try df.loc['1998'], I get the error message: KeyError: 'the label [1998] is not in the [index]'.

+4

python pandas

Richard Oct 9 '16 at 15:19

source share

1 answer

mtoto · Accepted Answer · 2016-10-09T15:29:42+0000

You must set the attribute name of your index None:

df.index.names = [None]
df.head()
#       Boys    Girls
#1996   333490  315995
#1997   329577  313518
#1998   325903  309998

As for getting data for 1998, just lose the quotes:

df.loc[1998]
#Boys     325903
#Girls    309998
#Name: 1998, dtype: int64

Why does `set_index` create an index label for the column name?

More articles: