Python counting frequency

Question

Python counting frequency

Suppose I have a list of words, and I want to know how many times each word appears in this list.

The obvious way to do this is:

words = "apple banana apple strawberry banana lemon" uniques = set(words.split()) freqs = [(item, words.split().count(item)) for item in uniques] print(freqs)

But I find this code not very good, because the program runs a list of words twice, once to build a set, and a second to count the number of occurrences.

Of course, I could write a function to start the list and count, but that would not be so Pythonic. So, is there a more efficient and Pythonic way?

+53

python count frequency counting

Daniyar May 21 '09 at 15:04

source share

11 answers

The Counter class in the collections module is specifically designed to solve this problem:

 from collections import Counter words = "apple banana apple strawberry banana lemon" Counter(words.split()) # Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})

+129

sykora May 21 '09 at 3:16 p.m.

source share

Standard approach:

 from collections import defaultdict words = "apple banana apple strawberry banana lemon" words = words.split() result = collections.defaultdict(int) for word in words: result[word] += 1 print result

Group oneliner:

 from itertools import groupby words = "apple banana apple strawberry banana lemon" words = words.split() result = dict((key, len(list(group))) for key, group in groupby(sorted(words))) print result

+11

nosklo May 21 '09 at 03:11

source share

 freqs = {} for word in words: freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize

I think this leads to the same thing as the Triptych solution, but without importing the collections. Also a bit like a Selinap solution, but more readable in my opinion. Almost identical to Thomas Weigel's decision, but without the use of exceptions.

This may be slower than using defaultdict () from the collection library. As the value is retrieved, incremented and then assigned again. Instead of just zooming in. However, using + = can do the same inside.

+9

hopla Jun 11 '09 at 20:21

source share

If you do not want to use the standard dictionary method (loop through the list, increasing the correct dict button), you can try the following:

 >>> from itertools import groupby >>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon'] >>> [(k, len(list(g))) for k, g in groupby(sorted(myList))] [('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)]

It starts at O (n log n) time.

+7

Nick Presta May 21, '09 at 15:09

source share

Without defaultdict:

 words = "apple banana apple strawberry banana lemon" my_count = {} for word in words.split(): try: my_count[word] += 1 except KeyError: my_count[word] = 1

+3

Thomas Weigel May 21 '09 at 15:59

source share

Can you use the account?

 words = 'the quick brown fox jumps over the lazy gray dog' words.count('z') #output: 1

0

Antonio Apr 07

source share

I happened to work on some Spark exercises, here is my solution.

 tokens = ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog'] print {n: float(tokens.count(n))/float(len(tokens)) for n in tokens}

** # conclusion of the above **

 {'brown': 0.16666666666666666, 'lazy': 0.16666666666666666, 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666, 'dog': 0.16666666666666666, 'quick': 0.16666666666666666}

0

javaidiot Jun 26 '15 at 6:02

source share

Use reduce () to convert the list into a single dict.

 words = "apple banana apple strawberry banana lemon" reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

returns

 {'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

0

Gadi Feb 23 '16 at 18:03

source share

 words = "apple banana apple strawberry banana lemon" w=words.split() e=list(set(w)) for i in e: print(w.count(i)) #Prints frequency of every word in the list

Hope this helps!

0

Varun Shaandhesh Nov 12 '17 at 16:17

source share

Below are some additional loops, but this is another method

 def func(tup): return tup[-1] def print_words(filename): f = open("small.txt",'r') whole_content = (f.read()).lower() print whole_content list_content = whole_content.split() dict = {} for one_word in list_content: dict[one_word] = 0 for one_word in list_content: dict[one_word] += 1 print dict.items() print sorted(dict.items(),key=func)

-one

Prabhu S Feb 27 '13 at 2:17

source share

Triptych · Accepted Answer · 2009-05-21 15:10

defaultdict to the rescue!

 from collections import defaultdict words = "apple banana apple strawberry banana lemon" d = defaultdict(int) for word in words.split(): d[word] += 1

This is done in O (n).

Python counting frequency

More articles: