I have 2 for loops that will run mostly for big data. I want to optimize this and maximize speed.
source = [['row1', 'row2', 'row3'],['Product', 'Cost', 'Quantity'],['Test17', '3216', '17'], ['Test18' , '3217' , '18' ], ['Test19', '3218', '19' ], ['Test20', '3219', '20']]
generator object creation
it = iter(source)
variables = ['row2', 'row3']
variables_indices = [1, 2]
getkey = rowgetter(*key_indices)
for row in it:
k = getkey(row)
for v, i in zip(variables, variables_indices):
try:
o = list(k)
o.append(v)
o.append(row[i])
yield tuple(o)
except IndexError:
pass
def rowgetter(*indices):
if len(indices) == 0:
return lambda row: tuple()
elif len(indices) == 1:
index = indices[0]
return lambda row: (row[index],)
else:
return operator.itemgetter(*indices)
This will return the tuple, but it takes so much time on average 100 seconds for 100,000 rows (in the example, the source has 5 rows). Can someone help reduce this time please.
note: I also tried inline loops and a list comprehension that doesn't return for every iteration