Python: grouping in time intervals (minutes) for data days

Question

Python: grouping in time intervals (minutes) for data days

I have a list of events that occur at exact mS intervals that span several days. I want to group all the events occurring in the "per-n-minutes" slot (maybe twenty events, there can be no events). I have a datetime.datetime element for each event, so I can get datetime.datetime.minute without any problems.

My list of events is sorted by time, at least the first, last last. The list is completed during the time period that I am working on.

The idea is that I can change the list: -

 [[a],[b],[c],[d],[e],[f],[g],[h],[i]...]

where a, b, c, occur between mins 0 and 29, d, e, f, g occur between mins 30 and 59, nothing between 0 and 29 (next hour), h, i between 30 and 59 ...

to the new list: -

 [[[a],[b],[c]],[[d],[e],[f],[g]],[],[[h],[i]]...]

I am not sure how to create an iterator that will go through two time intervals until the end of the list of time series. All I can think of when using xrange stops after it finishes, so I wondered if there was a way to use `while 'to do the slicing?

I will also use a shorter time interval, maybe 5 minutes, I used 30 minutes as a shorter demo example.

(for context, I am creating a graph based on the geometers of recent earthquakes in New Zealand and I want to show all the earthquakes that occur in a short period of time in one step to speed up the repetition) / p>

+6

python grouping

Jay gattuso Jul 25 '13 at 7:17

source share

6 answers

If you have the whole list, you can simply iterate over it and insert each event into the right time interval directly:

 grouped = [[] for _ in xrange(whatever)] for event in events: grouped[timeslot_of(event)].append(event)

If you need to turn iterability of events into a grouped iterative, things get a little messier. itertools.groupby almost works, but it skips time intervals without any events in them.

+1

user2357112 Jul 25 '13 at 7:31

source share

Assuming the events are available in a chronologically ordered list called events that has a datetime attribute called timestamp :

 interval = 10 # min period = 2*24*60 # two days in minutes timeslots = [[] for slot in range(period/interval)] for e in events: index = int((e.timestamp-events[0].timestamp).total_seconds()/60) / interval timeslots[index].append(e)

This uses the first event at t = 0 on the timeline. If this is not what you want, just replace events[0].timestamp reference to the datetime instance that represents your t = 0.

+1

Henrik Jul 25 '13 at 7:37

source share

Consider the following

 def time_in_range(t,t_min,delta_t): if t<=t_min+delta_t and t>=t_min: return True else: return False def group_list(input_list,ref_time,time_dx,result=[]): result.append([]) for i,item in enumerate(input_list): if time_in_range(item,ref_time,time_dx): result[-1].append(item) else: return group_list(input_list[i:],ref_time+time_dx,time_dx,result=result) def test(): input_list = [1,2,3,4,5,8,10,20,30] print group_list(input_list,0,5) test() # Ouput: # [[1, 2, 3, 4, 5], [8, 10], [], [20], [], [30]]

where you will need to write your own time_in_range function.

+1

esmit Jul 25 '13 at 7:57

source share

I wondered if there is a way to use `while 'to perform slicing?

I have this definition that can help you. It has no dependencies between libraries and uses a while loop on request:

If you have 2 lists; unix timestamps and values, each the same length, where:

timestamps [0] is the timestamp for values [0], respectively.

 timestamps = [unix, unix, unix, ....etc.] values = [0.1, 0.2, 0.5, 1.1, ....etc.]

will say that you have 30 days of data starting in November 2011, and you want them to be grouped hourly:

 BEGIN = 1320105600 hourly_values = [] z = 0 while z < 720: # 24 hours * 30 days = 720 hourly_values.append([]) # append an new empty list for each hour for i in range(len(timestamps)): if timestamps[i] >= (BEGIN + 3600*z): # 3600 sec = 1 hour if timestamps[i] < (BEGIN + 3600*(z+1)): hourly_values[z].append(values[i]) z+=1 return hourly_values

This will return a list of lists for every hour with empty lists in hours without data.

0

litepresence Mar 15 '15 at 3:44

source share

You can use the slot module. I had a similar problem and I ended up writing a general solution - https://github.com/saurabh-hirani/slotter

Ascinema demo - https://asciinema.org/a/8mm8f0qqurk4rqt90drkpvp1b?autoplay=1

0

Saurabh hirani Oct 10 '16 at 9:28

source share

sloth · Accepted Answer · 2013-07-25T07:44:10+0000

 # create sample data from datetime import datetime, timedelta d = datetime.now() data = [d + timedelta(minutes=i) for i in xrange(100)] # prepare and group the data from itertools import groupby def get_key(d): # group by 30 minutes k = d + timedelta(minutes=-(d.minute % 30)) return datetime(k.year, k.month, k.day, k.hour, k.minute, 0) g = groupby(sorted(data), key=get_key) # print data for key, items in g: print key for item in items: print '-', item

This is a python translation of this answer, which works by rounding the date and time to the next border and using for grouping.

If you really need empty groups, you can simply add them using this or a similar method:

 def add_missing_empty_frames(g): last_key = None for key, items in g: if last_key: while (key-last_key).seconds > 30*60: empty_key = last_key + timedelta(minutes=30) yield (empty_key, []) last_key = empty_key yield (key, items) last_key = key for key, items in add_missing_empty_frames(g): ...

Python: grouping in time intervals (minutes) for data days

More articles: