I am trying to save only certain columns of a DataFrame, and it works fine when the column names are strings:
In [2]: import numpy as np In [3]: import pandas as pd In [4]: a = np.arange(35).reshape(5,7) In [5]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], ['a', 'b', 'c', 'd', 'e', 'f', 'g']) In [6]: df Out[6]: abcdefg x 0 1 2 3 4 5 6 y 7 8 9 10 11 12 13 u 14 15 16 17 18 19 20 z 21 22 23 24 25 26 27 w 28 29 30 31 32 33 34 [5 rows x 7 columns] In [7]: df[[1,3]]
However, when the column names are integers, I get a key error:
In [8]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17)) In [9]: df Out[9]: 10 11 12 13 14 15 16 x 0 1 2 3 4 5 6 y 7 8 9 10 11 12 13 u 14 15 16 17 18 19 20 z 21 22 23 24 25 26 27 w 28 29 30 31 32 33 34 [5 rows x 7 columns] In [10]: df[[1,3]]
Results in:
KeyError: '[1 3] not in index'
I see why pandas does not allow this -> to avoid mixing between indexing by column names and column numbers. However, is there a way to tell pandas that I want to index by column numbers? Of course, one solution is to convert the column names to rows, but I'm wondering if there is a better solution.