Change multiple columns in pandas dataframe to datetime

I have a dataframe of 13 columns and 55,000 rows. I am trying to convert 5 of these strings to datetime, right now they are returning the type “object” and I need to convert this data for machine learning. I know that if I do

data['birth_date'] = pd.to_datetime(data[birth_date], errors ='coerce') 

it will return a datetime column, but I want to do it for 4 other columns too, is there one line that I can write to call everyone? I don’t think I can index, for example

 data[:,7:12] 

thanks!

+5
source share
4 answers

You can use apply to iterate through each column with pd.to_datetime

 data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce') 
+13
source

If performance is a problem, I would suggest using the following function to convert these columns to date_time:

 def lookup(s): """ This is an extremely fast approach to datetime parsing. For large data, the same dates are often repeated. Rather than re-parse these, we store all unique dates, parse them, and use a lookup to convert all dates. """ dates = {date:pd.to_datetime(date) for date in s.unique()} return s.apply(lambda v: dates[v]) to_datetime: 5799 ms dateutil: 5162 ms strptime: 1651 ms manual: 242 ms lookup: 32 ms 

Source: https://github.com/sanand0/benchmarks/tree/master/date-parse

+7
source

First you need to extract all the columns of interest to you from data , then you can use pandas applymap to apply to_datetime to each element in the selected frame, I suppose you know the index of the columns you want to extract. In the code below, the column names from the third to sixteenth columns are extracted . you can alternatively define a list and add column names to it and use them in place, you may also need to pass the date / time format of DateTime records

 import pandas as pd cols_2_extract = data.columns[2:15] data[cols_2_extract] = data[cols_2_extract].applymap(lambda x : pd.to_datetime(x, format = '%d %M %Y')) 
+1
source
 my_df[['column1','column2']] = my_df[['column1','column2']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f') 

Note: the format can be changed as needed.

0
source

Source: https://habr.com/ru/post/1262445/


All Articles