Python CSV - need to group and calculate values based on one key

Question

Python CSV - need to group and calculate values based on one key

I have a simple 3-column csv file that I need to use python to group each line based on one key, then the average value for the other key and return them. The file is the standard csv format configured in this way;

ID, ZIPCODE, RATE 1, 19003, 27.50 2, 19003, 31.33 3, 19083, 41.4 4, 19083, 17.9 5, 19102, 21.40

So basically I need to calculate the average col [2] rate for each unique col [1] zip code in this file and return the results. So get the average rate for all entries in 19003, 19083, etc.

I looked at using the csv module and reading a file in a dictionary, and then sorting a dict based on unique values in zipcode col, but it seems to have made no progress.

Any help / suggestions appreciated.

+4

python csv

ply Mar 16 '11 at 17:08

source share

2 answers

Usually, if I have to do complex development, I use csv to load rows in a relational database table (sqlite is the fastest way), then I use standard sql methods to extract data and calculate average values:

 import csv from StringIO import StringIO import sqlite3 data = """1,19003,27.50 2,19003,31.33 3,19083,41.4 4,19083,17.9 5,19102,21.40 """ f = StringIO(data) reader = csv.reader(f) conn = sqlite3.connect(':memory:') c = conn.cursor() c.execute('''create table data (ID text, ZIPCODE text, RATE real)''') conn.commit() for e in reader: e[2] = float(e[2]) c.execute("""insert into data values (?,?,?)""", e) conn.commit() c.execute('''select ZIPCODE, avg(RATE) from data group by ZIPCODE''') for row in c: print row

+3

axaroth Mar 16 '11 at 17:58

source share

samplebias · Accepted Answer · 2011-03-16T17:16:25+0000

I have documented some steps to help clarify the situation:

 import csv from collections import defaultdict # a dictionary whose value defaults to a list. data = defaultdict(list) # open the csv file and iterate over its rows. the enumerate() # function gives us an incrementing row number for i, row in enumerate(csv.reader(open('data.csv', 'rb'))): # skip the header line and any empty rows # we take advantage of the first row being indexed at 0 # i=0 which evaluates as false, as does an empty row if not i or not row: continue # unpack the columns into local variables _, zipcode, level = row # for each zipcode, add the level the list data[zipcode].append(float(level)) # loop over each zipcode and its list of levels and calculate the average for zipcode, levels in data.iteritems(): print zipcode, sum(levels) / float(len(levels))

Output:

 19102 21.4 19003 29.415 19083 29.65

Python CSV - need to group and calculate values ​​based on one key

More articles:

Python CSV - need to group and calculate values based on one key