Parse the CSV file and summarize the values

Question

Parse the CSV file and summarize the values

I would like to parse the CSV file and aggregate the values. In the city line the values are repeated (sample):

CITY,AMOUNT London,20 Tokyo,45 London,55 New York,25

After parsing, the result should look something like this:

 CITY, AMOUNT London,75 Tokyo,45 New York,25

I wrote the following code to extract unique city names:

 def main(): contrib_data = list(csv.DictReader(open('contributions.csv','rU'))) combined = [] for row in contrib_data: if row['OFFICE'] not in combined: combined.append(row['OFFICE'])

How can I then aggregate the values?

+4

python file

jwesonga Jan 10 '12 at 7:55

source share

2 answers

Using a dict with a value, as AMOUNT can do the trick. Something like the following -

Suppose you read one line at a time, and city indicates the current city and amount indicates the current amount -

 main_dict = {} ---for loop here--- if city in main_dict: main_dict[city] = main_dict[city] + amount else: main_dict[city] = amount ---end for loop---

At the end of the loop, you will have aggregated values in main_dict .

0

Siddharth Jan 10 '12 at 8:22

source share

Tim pietzcker · Accepted Answer · 2012-01-10T08:21:43+0000

Tested in Python 3.2.2:

 import csv from collections import defaultdict reader = csv.DictReader(open('test.csv', newline='')) cities = defaultdict(int) for row in reader: cities[row["CITY"]] += int(row["AMOUNT"]) writer = csv.writer(open('out.csv', 'w', newline = '')) writer.writerow(["CITY", "AMOUNT"]) writer.writerows([city, cities[city]] for city in cities)

Result:

 CITY,AMOUNT New York,25 London,75 Tokyo,45

As for your additional requirements:

 import csv from collections import defaultdict def default_factory(): return [0, None, None, 0] reader = csv.DictReader(open('test.csv', newline='')) cities = defaultdict(default_factory) for row in reader: amount = int(row["AMOUNT"]) cities[row["CITY"]][0] += amount max = cities[row["CITY"]][1] cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max min = cities[row["CITY"]][2] cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min cities[row["CITY"]][3] += 1 for city in cities: cities[city][3] = cities[city][0]/cities[city][3] # calculate mean writer = csv.writer(open('out.csv', 'w', newline = '')) writer.writerow(["CITY", "AMOUNT", "max", "min", "mean"]) writer.writerows([city] + cities[city] for city in cities)

It gives you

 CITY,AMOUNT,max,min,mean New York,25,25,25,25.0 London,75,55,20,37.5 Tokyo,45,45,45,45.0

Note that in Python 2 you will need an extra line from __future__ import division at the top to get the correct results.

Parse the CSV file and summarize the values

More articles: