To_datetime Meaning Error: at least you must specify [year, month, day] Pandas

I read from two different CSVs, each of which has date values ​​in its columns. After read_csv, I want to convert the data to datetime using the to_datetime method. The date formats in each CSV are slightly different from each other, and although the differences are noted and indicated in the to_datetime format argument, one converts the penalty and the other returns the next value error.

ValueError: to assemble mappings requires at least that [year, month, day] be sp ecified: [day,month,year] is missing 

dte.head () first

 0 10/14/2016 10/17/2016 10/19/2016 8/9/2016 10/17/2016 7/20/2016 1 7/15/2016 7/18/2016 7/20/2016 6/7/2016 7/18/2016 4/19/2016 2 4/15/2016 4/14/2016 4/18/2016 3/15/2016 4/18/2016 1/14/2016 3 1/15/2016 1/19/2016 1/19/2016 10/19/2015 1/19/2016 10/13/2015 4 10/15/2015 10/14/2015 10/19/2015 7/23/2015 10/14/2015 7/15/2015 

this data frame is converted using the following code:

 dte = pd.to_datetime(dte, infer_datetime_format=True) 

or

 dte = pd.to_datetime(dte[x], format='%m/%d/%Y') 

second dtd.head ()

 0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30 1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 

this csv does not convert using either:

 dtd = pd.to_datetime(dtd, infer_datetime_format=True) 

or

 dtd = pd.to_datetime(dtd, format='%Y-%m-%d') 

It returns an error value above. Interestingly, however, using parse_dates and infer_datetime_format as arguments to the read_csv method works fine. What's going on here?

+10
source share
3 answers

You can stack / pd.to_datetime / unstack

 pd.to_datetime(dte.stack()).unstack() 

enter image description here

explanation
pd.to_datetime works with a string, list, or pd.Series . dte is pd.DataFrame and that is why you are having problems. dte.stack() creates pd.Series where all lines are located one above the other. However, in this folded form, since it is pd.Series , I can get the vector pd.to_datetime to work with it. subsequent unstack just flips the initial stack to get the original dte form

+8
source

The apply function to_datetime works for me:

 print (dtd) 1 2 3 4 5 6 0 0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30 1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 dtd = dtd.apply(pd.to_datetime) print (dtd) 1 2 3 4 5 6 0 0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30 1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 
+5
source

It works for me:

 dtd.apply(lambda x: pd.to_datetime(x,errors = 'coerce', format = '%Y-%m-%d')) 

Thus, you can use the attributes of the function as described above (errors and format). See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html for more details .

0
source

Source: https://habr.com/ru/post/1258069/


All Articles