Word frequency in python not working

I am trying to count word frequencies in a text file using python.

I am using the following code:

openfile=open("total data", "r") linecount=0 for line in openfile: if line.strip(): linecount+=1 count={} while linecount>0: line=openfile.readline().split() for word in line: if word in count: count[word]+=1 else: count[word]=1 linecount-=1 print count 

But I get an empty dictionary. "print count" gives {} as output

I also tried using:

 from collections import defaultdict . . count=defaultdict(int) . . if word in count: count[word]=count.get(word,0)+1 

But again I get an empty dictionary. I do not understand what I am doing wrong. Can someone point?

+4
source share
3 answers

This for line in openfile: loop moves the file pointer at the end of the file. So, if you want to read the data again, either move the pointer ( openfile.seek(0) ) to the top of the file or reopen the file.

To make better use of the word frequency Collections.Counter :

 from collections import Counter with open("total data", "r") as openfile: c = Counter() for line in openfile: words = line.split() c.update(words) 
+9
source

Add openfile.seek(0) immediately after the initialization of count . This will put the read pointer at the top of the file.

+1
source

This is a much more direct way to count the frequency of words in a file:

 from collections import Counter def count_words_in_file(file_path): with open(file_path) as f: return Counter(f.read().split()) 

Example:

 >>> count_words_in_file('C:/Python27/README.txt').most_common(10) [('the', 395), ('to', 202), ('and', 129), ('is', 120), ('you', 111), ('a', 107), ('of', 102), ('in', 90), ('for', 84), ('Python', 69)] 
+1
source

Source: https://habr.com/ru/post/1489278/


All Articles