I have a column with identifiers and the time is encoded internally. For instance:
0 020160910223200_T1
1 020160910223200_T1
2 020160910223203_T1
3 020160910223203_T1
4 020160910223206_T1
5 020160910223206_T1
6 020160910223209_T1
7 020160910223209_T1
8 020160910223213_T1
9 020160910223213_T1
If we delete the first and last three characters, we get for the first line: 20160910223200, which should be converted to 2016-09-10 22:32:00.
My solution was to write a function that truncates identifiers and converts them to datetime. Then I applied this function to the df column.
from datetime import datetime
def MeasureIDtoTime(MeasureID):
MeasureID = str(MeasureID)
MeasureID = MeasureID[1:14]
Time = datetime.strptime(MeasureID, '%Y%m%d%H%M%S')
return Time
df['Time'] = df['MeasureID'].apply(MeasureIDtoTime)
This works correctly, however for my case it is slow. I have to deal with over 20 million lines and I need a faster solution. Any idea for a better solution?
Update
According to @MaxU there is a better solution:
pd.to_datetime(df.ID.str[1:-3], format = '%Y%m%d%H%M%S')
32 7,2 . R lubridate::ymd_hms() 2 . , Python.