Python and CMD regular expressions

I am having problems with a piece of python work. I have to write a piece of code that runs through CMD. I need this to then open the file that the user names and counts the number of all his alphabetic characters.

As long as I have it, that I can run through CDM and specify the file to open. I messed up with regular expressions, but can't figure out how to count individual characters. Any ideas? Sorry if I explained this poorly.

import sys import re filename = raw_input() count = 0 datafile=open(filename, 'r') 
+4
source share
5 answers

I would stay away from regular expressions. They will be slow and ugly. Instead, read the entire file in a line and use the built-in string count method to count characters.

Formulate this for you:

 filename = raw_input() datafile=open(filename, 'r') data = datafile.read() datafile.close() # Don't forget to close the file! counts = {} # make sure counts is an empty dictionary data = data.lower() # convert data to lowercase for k in range(97, 123): # letters a to z are ASCII codes 97 to 122 character = chr(k) # get the ASCII character from the number counts[character] = data.count(character) 

Then you have a counts dictionary containing all the counts. For example, counts['a'] gives you the number a in the file. Or, for the entire list of counters, do counts.items() .

+1
source

A counter type is useful for counting items. It was added in python 2.7:

 import collections counts = collections.Counter() for line in datafile: # remove the EOL and iterate over each character #if you desire the counts to be case insensitive, replace line.rstrip() with line.rstrip().lower() for c in line.rstrip(): # Missing items default to 0, so there is no special code for new characters counts[c] += 1 

To see the results:

 results = [(key, value) for key, value in counts.items() if key.isalpha()] print results 
+3
source

If the file is small enough to be read right away, it's very simple:

 from collections import Counter filename = raw_input() with open(filename) as f: data = f.read() counter = Counter(data.lower()) print('\n'.join(str((ch, counter[ch])) for ch in counter if ch.isalpha())) 
+2
source

If you want to use regular expressions, you can do the following:

 pattern = re.compile('[^a-zA-Z]+') # pattern for everything but letters only_letters = pattern.sub(text, '') # delete everything else count = len(only_letters) # total number of letters 

To count the number of different characters, use Counter, as already recommended.

+1
source

Regular expressions are useful if you want to find complex patterns in a string. Since you want to count (as opposed to searching) simple (only individual alphabetic characters) "patterns", regular expressions are not a selection tool here.

If I understand correctly what you are trying, the most transparent way to solve this is to iterate over all the lines and iterate over all the characters in this line, and if this character is alphabetic, add 1 to the corresponding dictionary entry, In the code:

 filename=raw_input() found = {} with open(filename) as file: for line in file: for character in line: if character in "abcdefghijklmnopqrstuvxyz": # Checking `in (explicit string)` is not quick, but transparent. # You can use something like `character.isalpha()` if you want it to # automatically depend on your locale. found[character] = found.get(character, 0)+1 # If there is no dictionary entry for character yet, assume default 0 # If you need eg small and capital letters counted together, # "Normalize" them to one particular type, for example using # found[character.upper()] = found.get(character, 0)+1 

After this loop went through the file, the dictionary found will contain the number of occurrences for each character.

+1
source

Source: https://habr.com/ru/post/1403151/


All Articles