Reading data from csv to pandas when date and time are in separate columns

I looked at the answer to this question: Parsing dates when YYYYMMDD and HH are in separate columns using pandas in Python , but it seems to work for me, which makes me think that I am doing something wrong.

I have data in CSV files that I am trying to read using the pandas read_csv function. Date and time are in two separate columns, but I want to combine them into a single "Datetime" column containing datetime objects. Csv looks like this:

Note about the data blank line Site Id,Date,Time,WTEQ.I-1... 2069, 2008-01-19, 06:00, -99.9... 2069, 2008-01-19, 07:00, -99.9... ... 

I am trying to read it using this line of code:

  read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, date_parser=True, na_values=["-99.9"]) 

However, when I write it back to csv, it looks exactly the same (except that -99.9s is changed to NA, as I pointed out with the na_values ​​argument). Date and time are in two separate columns. As far as I understand, this should create a new Datetime column, consisting of columns 1 and 2, parsed using date_parser. I also tried using parse_dates = {"Datetime": ["Date", "Time"]}, parse_dates = [[1,2]] and parse_dates = [["Date", "Time"]]. I also tried using date_parser = parse, where parse is defined as:

  parse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M') 

None of them made the slightest difference, which makes me suspect that there is a deeper problem. Any insight on what this could be?

+6
source share
1 answer

You must update your pandas, I recommend the latest stable version for the latest features and bug fixes.

This feature was introduced in 0.8.0 and works on pandas version 0.11:

 In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, na_values=["-99.9"]) Out[11]: Datetime Site Id WTEQ.I-1 0 2008-01-19 06:00:00 2069 NaN 1 2008-01-19 07:00:00 2069 NaN 

without date_parser=True (since this should be a parsing function, see docstring ).

Note that in the example above, the resulting "Datetime" column is a Series, not a DataFrame index value. If you prefer datetime values ​​as an index column rather than an integer value, pass an index_col argument indicating the desired column, in this case 0, since the "Datetime" column is the first.

 In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, index_col=0, na_values=["-99.9"]) 
+3
source

Source: https://habr.com/ru/post/948803/


All Articles