You read all the lines in a list, and then process that list. Do not do this .
Process your lines as you produce them. If you need to filter the data first, use the generator function:
import csv def getstuff(filename, criterion): with open(filename, "rb") as csvfile: datareader = csv.reader(csvfile) yield next(datareader) # yield the header row count = 0 for row in datareader: if row[3] == criterion: yield row count += 1 elif count: # done when having read a consecutive series of rows return
I also simplified your filter test; the logic is the same, but more concise.
Since you match only one string sequence matching the criteria, you can also use:
import csv from itertools import dropwhile, takewhile def getstuff(filename, criterion): with open(filename, "rb") as csvfile: datareader = csv.reader(csvfile) yield next(datareader) # yield the header row # first row, plus any subsequent rows that match, then stop # reading altogether # Python 2: use 'for row in takewhile(...): yield row' instead # instead of 'yield from takewhile(...)'. yield from takewhile( lambda r: r[3] == criterion, dropwhile(lambda r: r[3] != criterion, datareader)) return
Now you can loop into getstuff() directly. Do the same in getdata() :
def getdata(filename, criteria): for criterion in criteria: for row in getstuff(filename, criterion): yield row
Now loop directly to getdata() in your code:
for row in getdata(somefilename, sequence_of_criteria):
Now you keep in memory only one line instead of thousands of lines by criterion.
yield makes the function a generator function, which means that it will not do any work until you begin to execute it cyclically.