Parse the CSV file and summarize the values

I would like to parse the CSV file and aggregate the values. In the city line the values โ€‹โ€‹are repeated (sample):

CITY,AMOUNT London,20 Tokyo,45 London,55 New York,25 

After parsing, the result should look something like this:

 CITY, AMOUNT London,75 Tokyo,45 New York,25 

I wrote the following code to extract unique city names:

 def main(): contrib_data = list(csv.DictReader(open('contributions.csv','rU'))) combined = [] for row in contrib_data: if row['OFFICE'] not in combined: combined.append(row['OFFICE']) 

How can I then aggregate the values?

+4
source share
2 answers

Tested in Python 3.2.2:

 import csv from collections import defaultdict reader = csv.DictReader(open('test.csv', newline='')) cities = defaultdict(int) for row in reader: cities[row["CITY"]] += int(row["AMOUNT"]) writer = csv.writer(open('out.csv', 'w', newline = '')) writer.writerow(["CITY", "AMOUNT"]) writer.writerows([city, cities[city]] for city in cities) 

Result:

 CITY,AMOUNT New York,25 London,75 Tokyo,45 

As for your additional requirements:

 import csv from collections import defaultdict def default_factory(): return [0, None, None, 0] reader = csv.DictReader(open('test.csv', newline='')) cities = defaultdict(default_factory) for row in reader: amount = int(row["AMOUNT"]) cities[row["CITY"]][0] += amount max = cities[row["CITY"]][1] cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max min = cities[row["CITY"]][2] cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min cities[row["CITY"]][3] += 1 for city in cities: cities[city][3] = cities[city][0]/cities[city][3] # calculate mean writer = csv.writer(open('out.csv', 'w', newline = '')) writer.writerow(["CITY", "AMOUNT", "max", "min", "mean"]) writer.writerows([city] + cities[city] for city in cities) 

It gives you

 CITY,AMOUNT,max,min,mean New York,25,25,25,25.0 London,75,55,20,37.5 Tokyo,45,45,45,45.0 

Note that in Python 2 you will need an extra line from __future__ import division at the top to get the correct results.

+6
source

Using a dict with a value, as AMOUNT can do the trick. Something like the following -

Suppose you read one line at a time, and city indicates the current city and amount indicates the current amount -

 main_dict = {} ---for loop here--- if city in main_dict: main_dict[city] = main_dict[city] + amount else: main_dict[city] = amount ---end for loop--- 

At the end of the loop, you will have aggregated values โ€‹โ€‹in main_dict .

0
source

Source: https://habr.com/ru/post/1390184/


All Articles