Python counting frequency

Suppose I have a list of words, and I want to know how many times each word appears in this list.

The obvious way to do this is:

words = "apple banana apple strawberry banana lemon" uniques = set(words.split()) freqs = [(item, words.split().count(item)) for item in uniques] print(freqs) 

But I find this code not very good, because the program runs a list of words twice, once to build a set, and a second to count the number of occurrences.

Of course, I could write a function to start the list and count, but that would not be so Pythonic. So, is there a more efficient and Pythonic way?

+53
python count frequency counting
May 21 '09 at 15:04
source share
11 answers

defaultdict to the rescue!

 from collections import defaultdict words = "apple banana apple strawberry banana lemon" d = defaultdict(int) for word in words.split(): d[word] += 1 

This is done in O (n).

+93
May 21, '09 at 15:10
source share
— -

The Counter class in the collections module is specifically designed to solve this problem:

 from collections import Counter words = "apple banana apple strawberry banana lemon" Counter(words.split()) # Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1}) 
+129
May 21 '09 at
source share

Standard approach:

 from collections import defaultdict words = "apple banana apple strawberry banana lemon" words = words.split() result = collections.defaultdict(int) for word in words: result[word] += 1 print result 

Group oneliner:

 from itertools import groupby words = "apple banana apple strawberry banana lemon" words = words.split() result = dict((key, len(list(group))) for key, group in groupby(sorted(words))) print result 
+11
May 21 '09 at
source share
 freqs = {} for word in words: freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize 

I think this leads to the same thing as the Triptych solution, but without importing the collections. Also a bit like a Selinap solution, but more readable in my opinion. Almost identical to Thomas Weigel's decision, but without the use of exceptions.

This may be slower than using defaultdict () from the collection library. As the value is retrieved, incremented and then assigned again. Instead of just zooming in. However, using + = can do the same inside.

+9
Jun 11 '09 at 20:21
source share

If you do not want to use the standard dictionary method (loop through the list, increasing the correct dict button), you can try the following:

 >>> from itertools import groupby >>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon'] >>> [(k, len(list(g))) for k, g in groupby(sorted(myList))] [('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)] 

It starts at O ​​(n log n) time.

+7
May 21, '09 at 15:09
source share

Without defaultdict:

 words = "apple banana apple strawberry banana lemon" my_count = {} for word in words.split(): try: my_count[word] += 1 except KeyError: my_count[word] = 1 
+3
May 21 '09 at 15:59
source share

Can you use the account?

 words = 'the quick brown fox jumps over the lazy gray dog' words.count('z') #output: 1 
0
Apr 07
source share

I happened to work on some Spark exercises, here is my solution.

 tokens = ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog'] print {n: float(tokens.count(n))/float(len(tokens)) for n in tokens} 

** # conclusion of the above **

 {'brown': 0.16666666666666666, 'lazy': 0.16666666666666666, 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666, 'dog': 0.16666666666666666, 'quick': 0.16666666666666666} 
0
Jun 26 '15 at 6:02
source share

Use reduce () to convert the list into a single dict.

 words = "apple banana apple strawberry banana lemon" reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {}) 

returns

 {'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2} 
0
Feb 23 '16 at 18:03
source share
 words = "apple banana apple strawberry banana lemon" w=words.split() e=list(set(w)) for i in e: print(w.count(i)) #Prints frequency of every word in the list 

Hope this helps!

0
Nov 12 '17 at 16:17
source share

Below are some additional loops, but this is another method

 def func(tup): return tup[-1] def print_words(filename): f = open("small.txt",'r') whole_content = (f.read()).lower() print whole_content list_content = whole_content.split() dict = {} for one_word in list_content: dict[one_word] = 0 for one_word in list_content: dict[one_word] += 1 print dict.items() print sorted(dict.items(),key=func) 
-one
Feb 27 '13 at 2:17
source share



All Articles