I have a csv file that does not fit with pandas.read_csv when I filter columns using usecols and use multiple indexes.
import pandas as pd csv = r"""dummy,date,loc,x bar,20090101,a,1 bar,20090102,a,3 bar,20090103,a,5 bar,20090101,b,1 bar,20090102,b,3 bar,20090103,b,5""" f = open('foo.csv', 'w') f.write(csv) f.close() df1 = pd.read_csv('foo.csv', index_col=["date", "loc"], usecols=["dummy", "date", "loc", "x"], parse_dates=["date"], header=0, names=["dummy", "date", "loc", "x"]) print df1
I expect df1 and df2 to be the same except for the missing dummy column, but the columns fall into the wrong label. Also, the date is treated as a date.
In [118]: %run test.py dummy x date loc 2009-01-01 a bar 1 2009-01-02 a bar 3 2009-01-03 a bar 5 2009-01-01 b bar 1 2009-01-02 b bar 3 2009-01-03 b bar 5 date date loc a 1 20090101 3 20090102 5 20090103 b 1 20090101 3 20090102 5 20090103
Using column numbers instead of names gives me the same problem. I can solve this problem by dropping the dummy column after the read_csv step, but I'm trying to figure out what is going wrong. I am using pandas 0.10.1.
edit: fixed bad header usage.