Pandas DataFrame index by column numbers when column names are integers

Question

Pandas DataFrame index by column numbers when column names are integers

I am trying to save only certain columns of a DataFrame, and it works fine when the column names are strings:

In [2]: import numpy as np In [3]: import pandas as pd In [4]: a = np.arange(35).reshape(5,7) In [5]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], ['a', 'b', 'c', 'd', 'e', 'f', 'g']) In [6]: df Out[6]: abcdefg x 0 1 2 3 4 5 6 y 7 8 9 10 11 12 13 u 14 15 16 17 18 19 20 z 21 22 23 24 25 26 27 w 28 29 30 31 32 33 34 [5 rows x 7 columns] In [7]: df[[1,3]] #No problem Out[7]: bd x 1 3 y 8 10 u 15 17 z 22 24 w 29 31

However, when the column names are integers, I get a key error:

 In [8]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17)) In [9]: df Out[9]: 10 11 12 13 14 15 16 x 0 1 2 3 4 5 6 y 7 8 9 10 11 12 13 u 14 15 16 17 18 19 20 z 21 22 23 24 25 26 27 w 28 29 30 31 32 33 34 [5 rows x 7 columns] In [10]: df[[1,3]]

Results in:

 KeyError: '[1 3] not in index'

I see why pandas does not allow this -> to avoid mixing between indexing by column names and column numbers. However, is there a way to tell pandas that I want to index by column numbers? Of course, one solution is to convert the column names to rows, but I'm wondering if there is a better solution.

+5

python pandas

Akavall Nov 26 '14 at 18:19

source share

2 answers

This, of course, is one of those things that looks like a mistake, but is actually a constructive solution (I think).

Several work options:

rename the columns with their positions as their name:

  df.columns = arange(0,len(df.columns))

Another way is to get names from df.columns :

 print df[ df.columns[[1,3]] ] 11 13 x 1 3 y 8 10 u 15 17 z 22 24 w 29 31

I suspect this is most attractive since it just requires adding a small bit of code and not changing the column names.

+2

Jd long Nov 26 '14 at 18:29

source share

Jeff · Accepted Answer · 2014-11-26T21:13:32+0000

This is exactly the iloc target, see here

 In [37]: df Out[37]: 10 11 12 13 14 15 16 x 0 1 2 3 4 5 6 y 7 8 9 10 11 12 13 u 14 15 16 17 18 19 20 z 21 22 23 24 25 26 27 w 28 29 30 31 32 33 34 In [38]: df.iloc[:,[1,3]] Out[38]: 11 13 x 1 3 y 8 10 u 15 17 z 22 24 w 29 31

Pandas DataFrame index by column numbers when column names are integers

More articles: