How to group / count list items by range

If list x and list y:

x = [10,20,30] y = [1,2,3,15,22,27] 

I would like the return value to be a dictionary that has a number of elements that is less than the value of x:

 { 10:3, 20:1, 30:2, } 

I have a very large list, so I was hoping there was a better way to do this, which does not require a slow nested loop. I looked at the collections. Counter and itertools, and none of them offer a way to group. Is there a built-in module that can do this?

+4
source share
4 answers

You can use the bisect module and collections.Counter :

 >>> import bisect >>> from collections import Counter >>> Counter(x[bisect.bisect_left(x, item)] for item in y) Counter({10: 3, 30: 2, 20: 1}) 
+8
source

If you want to use numpy, basically you are asking for a histogram:

 x = [10,20,30] y = [1,2,3,15,22,27] np.histogram(y,bins=[0]+x) #(array([3, 1, 2]), array([ 0, 10, 20, 30])) 

To do this dict:

 b = np.histogram(y,bins=[0]+x)[0] d = { k:v for k,v in zip(x, b)} 

For short lists this is not worth it, but if your lists are long, it could be:

 In [292]: y = np.random.randint(0, 30, 1000) In [293]: %%timeit .....: b = np.histogram(y, bins=[0]+x)[0] .....: d = { k:v for k,v in zip(x, b)} .....: 1000 loops, best of 3: 185 Β΅s per loop In [294]: y = list(y) In [295]: timeit Counter(x[bisect.bisect_left(x, item)] for item in y) 100 loops, best of 3: 3.84 ms per loop In [311]: timeit dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x])) 100 loops, best of 3: 3.75 ms per loop 
+4
source

Short answer:

 dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x])) 

Long answer

First we need to iterate over y to check which element is smaller than something. If we do it in 10, we get the following:

 >>> [n_y for n_y in y if n_y < 10] [1, 2, 3] 

Then we need to make the '10' variable look cast x:

 >>> [[n_y for n_y in y if n_y < n_x] for n_x in x] [[1, 2, 3], [1, 2, 3, 15], [1, 2, 3, 15, 22, 27]] 

Finally, we need to add these results with the original x. Here, when zip comes in handy:

 >>> zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]) [(10, [1, 2, 3]), (20, [1, 2, 3, 15]), (30, [1, 2, 3, 15, 22, 27])] 

This gives a list of tuples, so we have to put a dict on it to get the final result:

 >>> dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x])) {10: [1, 2, 3], 20: [1, 2, 3, 15], 30: [1, 2, 3, 15, 22, 27]} 
+1
source

If the step between the values ​​in x always 10 , I would do it like this:

 >>> y = [1,2,3,15,22,27] >>> step = 10 >>> from collections import Counter >>> Counter(n - n%step + step for n in y) Counter({10: 3, 30: 2, 20: 1}) 
0
source

Source: https://habr.com/ru/post/1502090/


All Articles