Reading data from csv to pandas when date and time are in separate columns

Question

Reading data from csv to pandas when date and time are in separate columns

I looked at the answer to this question: Parsing dates when YYYYMMDD and HH are in separate columns using pandas in Python , but it seems to work for me, which makes me think that I am doing something wrong.

I have data in CSV files that I am trying to read using the pandas read_csv function. Date and time are in two separate columns, but I want to combine them into a single "Datetime" column containing datetime objects. Csv looks like this:

Note about the data blank line Site Id,Date,Time,WTEQ.I-1... 2069, 2008-01-19, 06:00, -99.9... 2069, 2008-01-19, 07:00, -99.9... ...

I am trying to read it using this line of code:

  read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, date_parser=True, na_values=["-99.9"])

However, when I write it back to csv, it looks exactly the same (except that -99.9s is changed to NA, as I pointed out with the na_values argument). Date and time are in two separate columns. As far as I understand, this should create a new Datetime column, consisting of columns 1 and 2, parsed using date_parser. I also tried using parse_dates = {"Datetime": ["Date", "Time"]}, parse_dates = [[1,2]] and parse_dates = [["Date", "Time"]]. I also tried using date_parser = parse, where parse is defined as:

  parse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M')

None of them made the slightest difference, which makes me suspect that there is a deeper problem. Any insight on what this could be?

+6

python pandas datetime csv

seaotternerd Jul 05 '13 at 16:02

source share

1 answer

Andy hayden · Accepted Answer · 2013-07-05T17:02:07+0000

You must update your pandas, I recommend the latest stable version for the latest features and bug fixes.

This feature was introduced in 0.8.0 and works on pandas version 0.11:

 In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, na_values=["-99.9"]) Out[11]: Datetime Site Id WTEQ.I-1 0 2008-01-19 06:00:00 2069 NaN 1 2008-01-19 07:00:00 2069 NaN

without date_parser=True (since this should be a parsing function, see docstring ).

Note that in the example above, the resulting "Datetime" column is a Series, not a DataFrame index value. If you prefer datetime values as an index column rather than an integer value, pass an index_col argument indicating the desired column, in this case 0, since the "Datetime" column is the first.

 In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, index_col=0, na_values=["-99.9"])

Reading data from csv to pandas when date and time are in separate columns

More articles: