I want to quickly write about ~ 10-20M ISO time strings accurate to microseconds up to datetime64 for use as a DataFrame index in pandas.
I am on pandas 0.9 and have tried the suggested git solutions, but I find it 20-30 minutes or never ends.
I think I found the problem. Compare the speed of the two:
rng = date_range('1/1/2000', periods=2000000, freq='ms') strings = [x.strftime('%Y-%m-%d %H:%M:%S.%f') for x in rng] timeit to_datetime(strings)
On my laptop ~ 300 ms.
rng = date_range('1/1/2000', periods=2000000, freq='ms') strings = [x.strftime('%Y%m%dT%H%M%S.%f') for x in rng] timeit to_datetime(strings)
On my laptop, forever and a day.
I'm probably going to just change the C ++ code that generates the timestamps to put them in a more detailed ISO form at the moment, because looping and fixing the format on tens of millions of marks is probably pretty slow ...
source share