I am trying to make a simple calculation currently in a loop forinto an array numpy. In this case, this is a calculation in a list of strings in the form:
strings = ['12,34', '56,78'...]
I need:
Separate the lines with a comma separator and make two entries, for example,
strings = [[12, 34], [56, 78]...]
Filter this nested list only for members that meet some arbitrary criteria, for example. both numbers in the sublist fall into a certain range.
I am trying to get to know the library numpy, but I could not use the improved computation speed without increasing the overhead in processing the original list. For example, my instinct is to convert split()and int()in Python to create an array, but it turned out to be more costly than a simple loop for.
In addition, I cannot compose the various operations numpynecessary for this in an array created from the original list. Is there any reasonable way to do this or is it a lost reason for such things when the array is used only once?
. , , Python, .
:
import random
import datetime as dt
import numpy as np
raw_locs = [str(random.randint(1,100)) + ',' + str(random.randint(1,100))
for x in xrange(100000)]
if __name__ =='__main__':
start1 = dt.datetime.now()
results = []
for point in raw_locs:
lon, lat = point.split(",")
lat = int(lat)
lon = int(lon)
if 0 <= lon <= 50 and 50 <= lat <= 100:
results.append(point)
end1 = dt.datetime.now()
start2 = dt.datetime.now()
converted_list = [map(int, item.split(',')) for item in raw_locs]
end2 = dt.datetime.now()
start3 = dt.datetime.now()
arr = np.array([map(int, item.split(',')) for item in raw_locs])
end3 = dt.datetime.now()
start4 = dt.datetime.now()
results2 = arr[((0 <= arr[:,0]) & (arr[:,0] <= 50)
& (50 <= arr[:,1]) & (arr[:,1] <= 100))]
end4 = dt.datetime.now()
print "Pure python for whole solution took: {}".format(end1 - start1)
print "Just python list comprehension prior to array took: {}".format(end2 - start2)
print "Comprehension + array creation took: {}".format(end3 - start3)
print "Numpy actual calculation took: {}".format(end4 - start4)
print "Total numpy time: {}".format(end4 - start3)