Python-File Parsing

Write a program that reads the text of a file called input.txt that contains an arbitrary number of lines of form "," then writes this information using a dictionary, and finally displays a list of countries represented in the file and the number of cities.

For example, if input.txt contains the following:

New York, US Angers, France Los Angeles, US Pau, France Dunkerque, France Mecca, Saudi Arabia 

The program displays the following (in some order):

 Saudi Arabia : 1 US : 2 France : 3 

My code is:

 from os import dirname def parseFile(filename, envin, envout = {}): exec "from sys import path" in envin exec "path.append(\"" + dirname(filename) + "\")" in envin envin.pop("path") lines = open(filename, 'r').read() exec lines in envin returndict = {} for key in envout: returndict[key] = envin[key] return returndict 

I get a syntax error: invalid syntax ... when I use my file name I used the file name input.txt

0
source share
4 answers

I would use defaultdict plus a list to structure the information. Thus, you can get additional statistics.

 import collections def parse_cities(filepath): countries_cities_map = collections.defaultdict(list) with open(filepath) as fd: for line in fd: values = line.strip().split(',') if len(values) == 2: city, country = values countries_cities_map[country].append(city) return countries_cities_map def format_cities_per_country(countries_cities_map): for country, cities in countries_cities_map.iteritems(): print " {ncities} Cities found in {country} country".format(country=country, ncities = len(cities)) if __name__ == '__main__': import sys filepath = sys.argv[1] format_cities_per_country(parse_cities(filepath)) 
+1
source

I don’t understand what you are trying to do, so I can’t explain how to fix it. In particular, why do you exec occupy lines of a file? And why write exec "foo" instead of just foo ? I think you should go back to the main Python tutorial ...

In any case, you need to do the following:

  • open file using the full path
  • for line in file: process the line and save it in the dictionary
  • return dictionary

What is it, no exec .

+4
source

Yes, this is the whole thing that you either do not need or should not do. Here, as I would before Python 2.7 (after that, use collections.Counter, as shown in other answers). Keep in mind that this will return a dictionary containing the counts, not a print, you will have to do it from the outside. I would also not prefer to give a complete solution for homework, but it has already been done, so I believe that there is no real damage explaining this a bit.

 def parseFile(filename): with open(filename, 'r') as fh: lines = fh.readlines() d={} for country in [line.split(',')[1].strip() for line in lines]: d[country] = d.get(country,0) + 1 return d 

Let's break it down a bit, right?

  with open(filename, 'r') as fh: lines = fh.readlines() 

So you usually open a text file for reading. This will throw an IOError exception if the file does not exist or you do not have permissions or the like, so you want to catch it. readlines () reads the entire file and splits it into lines, each line becomes an element in the list.

  d={} 

It just initializes an empty dictionary

  for country in [line.split(',')[1].strip() for line in lines]: 

This is where the fun begins. A bracket enclosed to the right is called list comprehension, and it basically generates a list for you. What pretty much speaks plain English is β€œfor each element a line in the list lines, take this element / line, divide it by each comma, take the second element (index 1) of the list, which you get from the split , remove any spaces from it and use the result as an element in the new list "Then the left part of it only iterates over the generated list, indicating the name" country "on the current element in the body area of ​​the loop.

  d[country] = d.get(country,0) + 1 

Well, think about what happens if, instead of the specified line, we used the following:

  d[country] = d[country] + 1 

It will work correctly (exception KeyError), because d [country] does not matter the first time. Therefore, we use the get () method, all dictionaries have it. Here's the great part - get () takes an optional second argument, which we want to get from it if the element we are looking for does not exist. Therefore, instead of failing, it returns 0, which (unlike None), we can add 1 to and update the dictionary with a new counter. Then we just return it.

Hope this helps.

+3
source
 import collections def readFile(fname): with open(fname) as inf: return [tuple(s.strip() for s in line.split(",")) for line in inf] def countCountries(city_list): return collections.Counter(country for city,country in city_list) def main(): cities = readFile("input.txt") countries = countCountries(cities) print("{0} cities found in {1} countries:".format(len(cities), len(countries))) for country, num in countries.iteritems(): print("{country}: {num}".format(country=country, num=num)) if __name__=="__main__": main() 
+1
source

Source: https://habr.com/ru/post/1348026/


All Articles