Computing a mode in a multimodal list in Python

Question

Computing a mode in a multimodal list in Python

I am trying to calculate the mode (most common value) of a list of values in Python. I came up with a solution that in any case gave the wrong answer, but then I realized that my data could be mutlimodal;

ie 1,1,2,3,4,4 mode = 1 & 4

Here is what I came up with so far:

 def mode(valueList): frequencies = {} for value in valueList: if value in frequencies: frequencies[value] += 1 else: frequencies[value] = 1 mode = max(frequencies.itervalues()) return mode

I think the problem here is that I was displaying a value, not a pointer to the maximum value. In any case, can anyone suggest a better way to do this, which can work where there is more than one mode? Or it fails, how can I fix what I have and determine one mode?

As you can probably say I'm very new to python, thanks for the help.

edit: should have mentioned that I am in Python 2.4

+4

python statistics

Captastic Mar 05 '12 at 13:46

source share

3 answers

In Python> = 2.7, use collections.Counter for frequency tables.

 from collections import Counter from itertools import takewhile data = [1,1,2,3,4,4] freq = Counter(data) mostfreq = freq.most_common() modes = list(takewhile(lambda x_f: x_f[1] == mostfreq[0][1], mostfreq))

Note the use of the anonymous function ( lambda ), which checks if the pair (_, f) the same frequency as the most common element.

+5

Fred foo Mar 05 '12 at 13:51

source share

you can use the counter for the top value during iteration, something like this:

 def mode(valueList): frequencies = {} mx = None for value in valueList: if value in frequencies: frequencies[value] += 1 else: frequencies[value] = 1 if not mx or frequencies[value] > mx[1]: mx = (value, frequencies[value]) mode = mx[0] return mode

another approach for multiple modes using nlargest, which can give you the N largest dictionary values:

 from heapq import nlargest import operator def mode(valueList, nmodes): frequencies = {} for value in valueList: frequencies[value] = frequencies.get(value, 0) + 1 return [x[0] for x in nlargest(nmodes,frequencies.iteritems(),operator.itemgetter(1))]

+1

Not_a_golfer Mar 05 '12 at 13:52

source share

senderle · Accepted Answer · 2012-03-05T13:53:34+0000

Well, the first problem is that yes, you are returning the value in the frequences rather than the key. This means that you get the score of the mode, not the mode itself. Usually, to get the mode, you should use the key keyword argument for max, for example:

 >>> max(frequencies, key=counts.get())

But in 2.4, which does not exist! Here's an approach that I believe will work in version 2.4:

 >>> import random >>> l = [random.randrange(0, 5) for _ in range(50)] >>> frequencies = {} >>> for i in l: ... frequencies[i] = frequencies.get(i, 0) + 1 ... >>> frequencies {0: 11, 1: 13, 2: 8, 3: 8, 4: 10} >>> mode = max((v, k) for k, v in frequencies.iteritems())[1] >>> mode 1 >>> max_freq = max(frequencies.itervalues()) >>> modes = [k for k, v in frequencies.iteritems() if v == max_freq] >>> modes [1]

I prefer the decorate-sort-undecorate identifier with the cmp keyword. I think this is more readable. Maybe just me.

Computing a mode in a multimodal list in Python

More articles: