I am trying to count all the letters in a txt file and then display them in descending order

As the name says:

So far this, when I'm in my code, really works, but I'm having problems displaying the information in order. Currently, it simply displays information randomly.

def frequencies(filename): infile=open(filename, 'r') wordcount={} content = infile.read() infile.close() counter = {} invalid = "''`,.?!:;-_\nβ€”' '" for word in content: word = content.lower() for letter in word: if letter not in invalid: if letter not in counter: counter[letter] = content.count(letter) print('{:8} appears {} times.'.format(letter, counter[letter])) 

Any help would be greatly appreciated.

+6
source share
4 answers

The display in descending order should be outside the scope of your search cycle, otherwise they will be displayed as they arise.

Sorting in decreasing order is quite simple using the built-in sorted (you need to set reverse - argument!)

However, python is batteries, and there is already Counter . So it could be just as simple:

 from collections import Counter from operator import itemgetter def frequencies(filename): # Sets are especially optimized for fast lookups so this will be # a perfect fit for the invalid characters. invalid = set("''`,.?!:;-_\nβ€”' '") # Using open in a with block makes sure the file is closed afterwards. with open(filename, 'r') as infile: # The "char for char ...." is a conditional generator expression # that feeds all characters to the counter that are not invalid. counter = Counter(char for char in infile.read().lower() if char not in invalid) # If you want to display the values: for char, charcount in sorted(counter.items(), key=itemgetter(1), reverse=True): print(char, charcount) 

The counter already has the most_common method, but you want to display all the characters and quantity so that it does not correspond to this case. However, if you want to know only x of the most common counters, then it will be convenient.

+1
source

Dictionaries are disordered data structures. In addition, if you want to count some elements in a data set, it is better to use collections.Counter() , which is more optimized and pythonic for this purpose.

Then you can simply use Counter.most_common(N) to print most of the N common elements in the Counter object.

Also with regard to opening files, you can simply use the with statement, which automatically closes the file at the end of the block. And it’s better not to print the final result inside your function, you can make your function a generator by yielding the intended lines and then printing them whenever you want.

 from collections import Counter def frequencies(filename, top_n): with open(filename) as infile: content = infile.read() invalid = "''`,.?!:;-_\nβ€”' '" counter = Counter(filter(lambda x: not invalid.__contains__(x), content)) for letter, count in counter.most_common(top_n): yield '{:8} appears {} times.'.format(letter, count) 

Then use the for loop to iterate over the generator function:

 for line in frequencies(filename, 100): print(line) 
+5
source

You do not need to iterate over the words and then the letters in them. When you iterate over a string (e.g. content ), you will already have separate characters (length 1 string). Then you want to wait until the end of the counting cycle before showing the output. After counting, you can manually sort:

 for letter, count in sorted(counter.items(), key=lambda x: x[1], reverse=True): # do stuff 

However, it is better to use collections.Counter :

 from collections import Counter content = filter(lambda x: x not in invalid, content) c = Counter(content) for letter, count in c.most_common(): # descending order of counts print('{:8} appears {} times.'.format(letter, number)) # for letter, number in c.most_common(n): # limit to n most # print('{:8} appears {} times.'.format(letter, count)) 
+4
source

You can sort the dictionary during printing using the sorted method:

 lettercount = {} invalid = "''`,.?!:;-_\nβ€”' '" infile = open('text.file') for c in infile.read().lower(): if c not in invalid: lettercount[c] = lettercount.setdefault(c,0) + 1 for letter in sorted(lettercount): print("{} appears {} times".format(letter,lettercount[letter])) 

Rmq: I used the setdefault change method to set the default value to 0 when we meet the letter for the first time

0
source

Source: https://habr.com/ru/post/1013871/


All Articles