List Python groups by date

Let's say I have a list that looks like this:

[(datetime.datetime(2013, 8, 8, 1, 20, 15), 2060), (datetime.datetime(2013, 8, 9, 1, 6, 14), 2055), (datetime.datetime(2013, 8, 9, 1, 21, 1), 2050), (datetime.datetime(2013, 8, 10, 1, 5, 49), 2050), (datetime.datetime(2013, 8, 10, 1, 19, 51), 2050), (datetime.datetime(2013, 8, 11, 2, 4, 53), 2050), (datetime.datetime(2013, 8, 12, 0, 29, 45), 2050), (datetime.datetime(2013, 8, 12, 0, 44, 13), 2050), (datetime.datetime(2013, 8, 13, 0, 34, 13), 2050), (datetime.datetime(2013, 8, 13, 0, 47, 29), 2050), (datetime.datetime(2013, 8, 14, 1, 30, 39), 2050), (datetime.datetime(2013, 8, 14, 1, 33, 51), 2050), (datetime.datetime(2013, 8, 15, 0, 41, 1), 2050), (datetime.datetime(2013, 8, 15, 0, 54, 45), 2050), (datetime.datetime(2013, 8, 16, 0, 29, 57), 1950), (datetime.datetime(2013, 8, 16, 0, 43, 11), 1950), (datetime.datetime(2013, 8, 17, 0, 27, 4), 1950), (datetime.datetime(2013, 8, 17, 0, 42, 30), 1950), (datetime.datetime(2013, 8, 18, 0, 26, 26), 1950), (datetime.datetime(2013, 8, 18, 0, 43, 11), 1950), (datetime.datetime(2013, 8, 19, 0, 41, 49), 1950), (datetime.datetime(2013, 8, 20, 1, 10, 23), 1950), (datetime.datetime(2013, 8, 20, 1, 23, 44), 1950), (datetime.datetime(2013, 8, 21, 0, 47, 25), 1950), (datetime.datetime(2013, 8, 21, 1, 0, 12), 1950), (datetime.datetime(2013, 8, 22, 0, 45, 21), 1950), (datetime.datetime(2013, 8, 22, 1, 4, 33), 1950), (datetime.datetime(2013, 8, 23, 0, 51, 27), 1950), (datetime.datetime(2013, 8, 23, 1, 6, 36), 1950), (datetime.datetime(2013, 8, 24, 0, 41, 3), 1950), (datetime.datetime(2013, 8, 24, 0, 53, 14), 1950), (datetime.datetime(2013, 8, 25, 0, 29, 24), 1950), (datetime.datetime(2013, 8, 25, 0, 42, 40), 1950), (datetime.datetime(2013, 8, 26, 0, 28, 13), 1950), (datetime.datetime(2013, 8, 26, 0, 43, 30), 1950), (datetime.datetime(2013, 8, 27, 0, 30, 1), 1950), (datetime.datetime(2013, 8, 27, 0, 43, 43), 1950), (datetime.datetime(2013, 8, 28, 0, 33, 19), 1950), (datetime.datetime(2013, 8, 28, 0, 49, 11), 1950), (datetime.datetime(2013, 8, 29, 0, 26, 49), 1950), (datetime.datetime(2013, 8, 29, 0, 41, 21), 1950), (datetime.datetime(2013, 8, 30, 0, 26, 13), 1950), (datetime.datetime(2013, 8, 30, 0, 42, 9), 1950), (datetime.datetime(2013, 8, 31, 0, 23, 40), 1950), (datetime.datetime(2013, 8, 31, 0, 39, 49), 1950), (datetime.datetime(2013, 9, 1, 0, 22, 2), 1950), (datetime.datetime(2013, 9, 1, 0, 38, 16), 1950), (datetime.datetime(2013, 9, 2, 0, 21, 2), 1950), (datetime.datetime(2013, 9, 2, 0, 36, 19), 1950), (datetime.datetime(2013, 9, 3, 0, 22, 16), 1950), (datetime.datetime(2013, 9, 3, 0, 39, 2), 1900)] 

it’s clear that you see that this is a list of tuples, and the first element in each tuple is a timestamp. Already in a good format, generated:

 datetime.strptime(record[0], timeFormat) 

And the second element is the importance of monitoring. However, there may be several entries each day. For example, there are two entries in datetime.datetime (2013, 8, 9 ..) that have two different values ​​of 2055 and 2050. What I want is actually the maximum on every day. So in this case. 2055 will be the only entries for (2013, 8, 9).

I am wondering if there will be a convenient way in Python to do this. Some things are similar to mysql:

 select date(timestamp), max(value) from table group by date(timestamp) 

The mysql operation is just to show the idea, and I definitely want a python solution.

+4
source share
2 answers

Use itertools.groupby :

 >>> records = [(datetime.datetime(2013, 8, 8, 1, 20, 15), 2060), ....] >>> import itertools >>> [(dt, max(v for d, v in grp)) for dt, grp in itertools.groupby(records, key=lambda x: x[0].date())] [(datetime.date(2013, 8, 8), 2060), (datetime.date(2013, 8, 9), 2055), (datetime.date(2013, 8, 10), 2050), ... ] 

NOTE : assumes records are sorted. If not, you must first sort them by date.

+7
source

You can use collections.defaultdict (this will work for both sorted and unsorted data in O(N) ):

 >>> from collections import defaultdict >>> lis = [(datetime.datetime(2013, 8, 8, 1, 20, 15), 2060), (datetime.datetime(2013, 8, 9, 1, 6, 14), 2055), (datetime.datetime(2013, 8, 9, 1, 21, 1), 2050), (datetime.datetime(2013, 8, 10, 1, 5, 49), 2050), (datetime.datetime(2013, 8, 10, 1, 19, 51), 2050), (datetime.datetime(2013, 8, 11, 2, 4, 53), 2050), (datetime.datetime(2013, 8, 12, 0, 29, 45), 2050), (datetime.datetime(2013, 8, 12, 0, 44, 13), 2050), (datetime.datetime(2013, 8, 13, 0, 34, 13), 2050), (datetime.datetime(2013, 8, 13, 0, 47, 29), 2050), (datetime.datetime(2013, 8, 14, 1, 30, 39), 2050), (datetime.datetime(2013, 8, 14, 1, 33, 51), 2050), (datetime.datetime(2013, 8, 15, 0, 41, 1), 2050), (datetime.datetime(2013, 8, 15, 0, 54, 45), 2050), (datetime.datetime(2013, 8, 16, 0, 29, 57), 1950), (datetime.datetime(2013, 8, 16, 0, 43, 11), 1950), (datetime.datetime(2013, 8, 17, 0, 27, 4), 1950), (datetime.datetime(2013, 8, 17, 0, 42, 30), 1950), (datetime.datetime(2013, 8, 18, 0, 26, 26), 1950), (datetime.datetime(2013, 8, 18, 0, 43, 11), 1950), (datetime.datetime(2013, 8, 19, 0, 41, 49), 1950), (datetime.datetime(2013, 8, 20, 1, 10, 23), 1950), (datetime.datetime(2013, 8, 20, 1, 23, 44), 1950), (datetime.datetime(2013, 8, 21, 0, 47, 25), 1950), (datetime.datetime(2013, 8, 21, 1, 0, 12), 1950), (datetime.datetime(2013, 8, 22, 0, 45, 21), 1950), (datetime.datetime(2013, 8, 22, 1, 4, 33), 1950), (datetime.datetime(2013, 8, 23, 0, 51, 27), 1950), (datetime.datetime(2013, 8, 23, 1, 6, 36), 1950), (datetime.datetime(2013, 8, 24, 0, 41, 3), 1950), (datetime.datetime(2013, 8, 24, 0, 53, 14), 1950), (datetime.datetime(2013, 8, 25, 0, 29, 24), 1950), (datetime.datetime(2013, 8, 25, 0, 42, 40), 1950), (datetime.datetime(2013, 8, 26, 0, 28, 13), 1950), (datetime.datetime(2013, 8, 26, 0, 43, 30), 1950), (datetime.datetime(2013, 8, 27, 0, 30, 1), 1950), (datetime.datetime(2013, 8, 27, 0, 43, 43), 1950), (datetime.datetime(2013, 8, 28, 0, 33, 19), 1950), (datetime.datetime(2013, 8, 28, 0, 49, 11), 1950), (datetime.datetime(2013, 8, 29, 0, 26, 49), 1950), (datetime.datetime(2013, 8, 29, 0, 41, 21), 1950), (datetime.datetime(2013, 8, 30, 0, 26, 13), 1950), (datetime.datetime(2013, 8, 30, 0, 42, 9), 1950), (datetime.datetime(2013, 8, 31, 0, 23, 40), 1950), (datetime.datetime(2013, 8, 31, 0, 39, 49), 1950), (datetime.datetime(2013, 9, 1, 0, 22, 2), 1950), (datetime.datetime(2013, 9, 1, 0, 38, 16), 1950), (datetime.datetime(2013, 9, 2, 0, 21, 2), 1950), (datetime.datetime(2013, 9, 2, 0, 36, 19), 1950), (datetime.datetime(2013, 9, 3, 0, 22, 16), 1950), (datetime.datetime(2013, 9, 3, 0, 39, 2), 1900)] >>> dic = defaultdict(list) for dt, val in lis: dic[dt.date()].append(val) ... >>> for k, v in dic.iteritems(): print k, max(v) ... 2013-08-20 1950 2013-08-15 2050 2013-08-22 1950 2013-08-09 2055 2013-08-16 1950 2013-08-11 2050 2013-08-18 1950 2013-09-03 1950 2013-09-01 1950 ... 

As mentioned in @hughdbrown, the best way would be:

 >>> dic = {} >>> for dt, val in lis: ... dt = dt.date() ... dic[dt] = max(dic.get(dt,0), val) ... >>> for k, v in dic.iteritems(): ... print k,v ... 2013-08-20 1950 2013-08-15 2050 2013-08-22 1950 2013-08-09 2055 2013-08-16 1950 2013-08-11 2050 2013-08-18 1950 2013-09-03 1950 2013-09-01 1950 ... 
+2
source

Source: https://habr.com/ru/post/1500440/


All Articles