Read letters in a text file

I am a beginner Python programmer, and I am trying to make a program that counts the number of letters in a text file. Here is what I got so far:

import string text = open('text.txt') letters = string.ascii_lowercase for i in text: text_lower = i.lower() text_nospace = text_lower.replace(" ", "") text_nopunctuation = text_nospace.strip(string.punctuation) for a in letters: if a in text_nopunctuation: num = text_nopunctuation.count(a) print(a, num) 

If the text file contains hello bob , I need the output:

 b 2 e 1 h 1 l 2 o 2 

My problem is that it does not work properly when a text file contains more than one line of text or has punctuation marks.

+4
source share
8 answers

This is a very readable way to accomplish what you want using Counter :

 from string import ascii_lowercase from collections import Counter with open('text.txt') as f: print Counter(letter for line in f for letter in line.lower() if letter in ascii_lowercase) 

You can iterate the resulting dict to print it in the format you want.

+10
source

You should use collections.Counter

 from collections import Counter text = 'aaaaabbbbbccccc' c = Counter(text) print c 

He prints:

 Counter({'a': 5, 'c': 5, 'b': 5}) 

Your text variable should be:

 import string text = open('text.txt').read() # Filter all characters that are not letters. text = filter(lambda x: x in string.letters, text.lower()) 

To get the desired result:

 for letter, repetitions in c.iteritems(): print letter, repetitions 

In my example, it prints:

 a 5 c 5 b 5 

For more information doc counters

+1
source

Using re:

 import re context, m = 'some file to search or text', {} letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] for i in range(len(letters)): m[letters[i]] = len(re.findall('{0}'.format(letters[i]), context)) print '{0} -> {1}'.format(letters[i], m[letters[i]]) 

This is all the more elegant and clean with Counter. However.

+1
source
 import string fp=open('text.txt','r') file_list=fp.readlines() print file_list freqs = {} for line in file_list: line = filter(lambda x: x in string.letters, line.lower()) for char in line: if char in freqs: freqs[char] += 1 else: freqs[char] = 1 print freqs 
+1
source

Just for completeness, if you want to do this without using Counter , here is another very short way using list comprehension and dict builtin:

 from string import ascii_lowercase as letters with open("text.txt") as f: text = f.read().lower() print dict((l, text.count(l)) for l in letters) 

f.read() will read the contents of the entire file in the text variable (maybe a bad idea if the file is really large); then we use list comprehension to create a list of tuples (letter, count in text) and convert this list of tuples into a dictionary. With Python 2.7+, you can also use {l: text.count(l) for l in letters} , which is even shorter and slightly more readable.

Note, however, that this will search for text several times, once for each letter, while Counter scans it only once and updates the counts for all letters at a time.

+1
source

You could divide the problem into two simpler tasks:

 #!/usr/bin/env python import fileinput # accept input from stdin and/or files specified at command-line from collections import Counter from itertools import chain from string import ascii_lowercase # 1. count frequencies of all characters (bytes on Python 2) freq = Counter(chain.from_iterable(fileinput.input())) # read one line at a time # 2. print frequencies of ascii letters for c in ascii_lowercase: n = freq[c] + freq[c.upper()] # merge lower- and upper-case occurrences if n != 0: print(c, n) 
0
source

Another way:

 import sys from collections import defaultdict read_chunk_size = 65536 freq = defaultdict(int) for c in sys.stdin.read(read_chunk_size): freq[ord(c.lower())] += 1 for symbol, count in sorted(freq.items(), key=lambda kv: kv[1], reverse=True): print(chr(symbol), count) 

It displays the characters most commonly encountered, at least.

The character counting cycle has O (1) complexity and can process arbitrarily large files because it reads the file in read_chunk_size chunks.

0
source
 import sys def main(): try: fileCountAllLetters = file(sys.argv[1], 'r') print "Count all your letters: ", len(fileCountAllLetters.read()) except IndexError: print "You forget add file in argument!" except IOError: print "File like this not your folder!" main() 

python file.py countlettersfile.txt

-one
source

Source: https://habr.com/ru/post/1500830/


All Articles