Word frequency in python not working

Question

Word frequency in python not working

I am trying to count word frequencies in a text file using python.

I am using the following code:

openfile=open("total data", "r") linecount=0 for line in openfile: if line.strip(): linecount+=1 count={} while linecount>0: line=openfile.readline().split() for word in line: if word in count: count[word]+=1 else: count[word]=1 linecount-=1 print count

But I get an empty dictionary. "print count" gives {} as output

I also tried using:

 from collections import defaultdict . . count=defaultdict(int) . . if word in count: count[word]=count.get(word,0)+1

But again I get an empty dictionary. I do not understand what I am doing wrong. Can someone point?

+4

python dictionary

nish Jul 02 '13 at 13:14

source share

3 answers

Add openfile.seek(0) immediately after the initialization of count . This will put the read pointer at the top of the file.

+1

eduffy Jul 02 '13 at 13:18

source share

This is a much more direct way to count the frequency of words in a file:

 from collections import Counter def count_words_in_file(file_path): with open(file_path) as f: return Counter(f.read().split())

Example:

 >>> count_words_in_file('C:/Python27/README.txt').most_common(10) [('the', 395), ('to', 202), ('and', 129), ('is', 120), ('you', 111), ('a', 107), ('of', 102), ('in', 90), ('for', 84), ('Python', 69)]

+1

Inbar rose Jul 02 '13 at 13:21

source share

Ashwini chaudhary · Accepted Answer · 2013-07-02T13:17:03+0000

This for line in openfile: loop moves the file pointer at the end of the file. So, if you want to read the data again, either move the pointer ( openfile.seek(0) ) to the top of the file or reopen the file.

To make better use of the word frequency Collections.Counter :

 from collections import Counter with open("total data", "r") as openfile: c = Counter() for line in openfile: words = line.split() c.update(words)

Word frequency in python not working

More articles: