Python - counting the number of specific ranges in a list

So basically I want to count the number of occurrences that the floating point appears in this list. For example: a list of ratings (all points out of 100) is entered by the user, and they are sorted into groups of ten. How many times do you have to make estimates from 0-10, 10-20, 20-30, etc.)? Like the distribution of tests. I know that I can use the count function, but since I am not looking for specific numbers, I am having problems. Is it possible to combine quantity and range? Thanks for any help.

+4
source share
4 answers

To group the data, divide it by the width of the interval. To count the quantity in each group, consider using collections.Counter . Here is an elaborated example with documentation and test:

from collections import Counter def histogram(iterable, low, high, bins): '''Count elements from the iterable into evenly spaced bins >>> scores = [82, 85, 90, 91, 70, 87, 45] >>> histogram(scores, 0, 100, 10) [0, 0, 0, 0, 1, 0, 0, 1, 3, 2] ''' step = (high - low + 0.0) / bins dist = Counter((float(x) - low) // step for x in iterable) return [dist[b] for b in range(bins)] if __name__ == '__main__': import doctest print doctest.testmod() 
+6
source
 decs = [int(x/10) for x in scores] 

displays ratings from 0-9 → 0, 10-19 → 1, etc. Then just count the occurrences of 0, 1, 2, 3, etc. (Via something like collections.Counter ) and go back to the ranges from there.

+4
source

If you are good at using the external NumPy library, you just need to call numpy.histogram() :

 >>> data = [82, 85, 90, 91, 70, 87, 45] >>> counts, bins = numpy.histogram(data, bins=10, range=(0, 100)) >>> counts array([0, 0, 0, 0, 1, 0, 0, 1, 3, 2]) >>> bins array([ 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100.]) 
+4
source

This method uses bisect, which may be more efficient, but you need to sort the results first.

 from bisect import bisect import random scores = [random.randint(0,100) for _ in xrange(100)] bins = [20, 40, 60, 80, 100] scores.sort() counts = [] last = 0 for range_max in bins: i = bisect(scores, range_max, last) counts.append(i - last) last = i 

I would not expect you to install numpy just for this, but if you already have numpy, you can use numpy.histogram .

UPDATE

Firstly, using bisect is more flexible. Using [i//n for i in scores] requires all boxes to be the same size. Using bisect allows bins to have arbitrary limits. Also i//n means that the ranges are [lo, hi). Using bisect, ranges (lo, hi], but you can use bisect_left if you want [lo, hi).

The second bisector is faster, see the timings below. I replaced scores.sort () with slower sorts (points), because sorting is the slowest step, and I did not want to shift times with a pre-sorted array, but the OP says that his / her array is already sorted in this case like that a bisector may make even more sense.

 setup=""" from bisect import bisect_left import random from collections import Counter def histogram(iterable, low, high, bins): step = (high - low) / bins dist = Counter(((x - low + 0.) // step for x in iterable)) return [dist[b] for b in xrange(bins)] def histogram_bisect(scores, groups): scores = sorted(scores) counts = [] last = 0 for range_max in groups: i = bisect_left(scores, range_max, last) counts.append(i - last) last = i return counts def histogram_simple(scores, bin_size): scores = [i//bin_size for i in scores] return [scores.count(i) for i in range(max(scores)+1)] scores = [random.randint(0,100) for _ in xrange(100)] bins = range(10, 101, 10) """ from timeit import repeat t = repeat('C = histogram(scores, 0, 100, 10)', setup=setup, number=10000) print min(t) #.95 t = repeat('C = histogram_bisect(scores, bins)', setup=setup, number=10000) print min(t) #.22 t = repeat('histogram_simple(scores, 10)', setup=setup, number=10000) print min(t) #.36 
+2
source

Source: https://habr.com/ru/post/1399526/


All Articles