A faster way to process a time string using python

Question

A faster way to process a time string using python

I have many log files with a format such as:

2012-09-12 23:12:00 other logs here

and I need to extract the time string and compare the time delta between the two log entries. I did this with this:

 for line in log: l = line.strip().split() timelist = [int(n) for n in re.split("[- :]", l[0]+' ' + l[1])] #now the timelist looks like [2012,9,12,23,12,0]

Then when I got two entries

 d1 = datetime.datetime(timelist1[0], timelist1[1], timelist1[2], timelist1[3], timelist1[4], timelist1[5]) d2 = datetime.datetime(timelist2[0], timelist2[1], timelist2[2], timelist2[3], timelist2[4], timelist2[5]) delta = (d2-d1).seconds

The problem is that it is slow, is there a way to improve performance? Thanks in advance.

0

python

cheneydeng Sep 13 '12 at 3:49

source share

3 answers

You can do this completely with regular expressions, which can be faster.

 find_time = re.compile("^(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})") for line in log: timelist = find_time.match(line) if timelist: d = datetime.datetime(*map(int, timelist.groups()))

+1

grc Sep 13 '12 at 4:02

source share

You can also try without regexp using the optional split argument

 (date, time, log) = line.split(" ", 2) timerecord = datetime.datetime.strptime(date+" "+time, "%Y-%m-%d %H:%M:%S")

and then you will need to calculate your timedeltas between consecutive timerecord s

+1

Pierre GM Sep 13 '12 at 10:03

source share

Blender · Accepted Answer · 2012-09-13T03:54:56+0000

You can get rid of the regular expression and use map :

 date_time = datetime.datetime for line in log: date, time = line.strip().split(' ', 2)[:2] timelist = map(int, date.split('-') + time.split(':')) d = date_time(*timelist)

I think .split(' ', 2) will be faster than just .split() , because it splits up to two times and only into spaces, and not into any spaces.
map(int, l) faster than [int(x) for x in l] the last time I checked.
If you can, get rid of .strip() .

A faster way to process a time string using python

More articles: