How to avoid using deadly function 30,000 times in list comprehension?

I have two lists of the same length. I want to check the conditions in one list. If the conditions are true, then run the very important memory / processing function in another list.

My first attempt was this:

records = [(a, deadly_func(b)) for a, b in zip(listA, listB) if a == "condition"]

This immediately allocated all the memory on my desktop and lasted some time before I killed it. Obviously, he ran deadly_func (b) for all 30,000 items in listB, while the intention was to use the if statement to filter listB down to about 30 items.

I managed to create a working version using

records = [(a, i) for a, i in zip(listA, range(len(listB)) if a == "condition"]
records = [(a, deadly_func(listB[i]) for a, i in records] 

Why didn't my first attempt work? Is there a more pythonic way to make this work?


Edit: Thanks for the answers. Here is the actual code for both versions

Does not work:

import shapefile, shapely.geometry as shpgeo

lat = 42.3968243
lon = -71.0313479

sf = shapefile.Reader("/opt/ziplfs/tl_2014_us_zcta510.shp")

records = [(r[0], shpgeo.shape(s.__geo_interface__)) for r, s in zip(sf.records(), sf.shapes()) if haversine(lon, lat, float(r[8]), float(r[7])) < 10]

haversine() - haversine, lat long .

from math import sqrt, sin, cos, radians, asin
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees). Return is in kilometers
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

('tl_2014_us_zcta510.shp') - , . , 800 , , .

script , , 10 , .

:

records = [(r[0], i) for r, i in zip(sf.records(), range(len(sf.records()))) if haversine(lon, lat, float(r[8]), float(r[7])) < 10]
shapes = [shpgeo.shape(sf.shape(i).__geo_interface__) for r, i in records]

. "" :

$ python test.py 
Time Elapsed: 0:00:14.221533
$ python test.py 
Time Elapsed: 0:00:14.637827
$ python test.py 
Time Elapsed: 0:00:14.253425

:

$ python test.py 
Time Elapsed: 0:00:01.887987
$ python test.py 
Time Elapsed: 0:00:01.886635
$ python test.py 
Time Elapsed: 0:00:01.982547

, "" , , 30 .

+4
2

? deadly_func listB. , listA True:

listA = [True, False, True, False]
listB = [1, 2, 3, 4]

def deadly_func(x):
    print("Called with {}".format(x))
    return x

print([(a, deadly_func(b)) for a, b in zip(listA, listB) if a])

# Output:
# Called with 1
# Called with 3
# [(True, 1), (True, 3)]

, , sf.shapes() - . , sf.shape(i) , , .

, :

records = [(r[0], shpgeo.shape(sf.shape(i).__geo_interface__)) for i, r in enumerate(sf.records()) if haversine(lon, lat, float(r[8]), float(r[7])) < 10]

(, , .)

+6

, for loop.

:

things = []

for a, b in zip(listA, listB):
    if a == "condition":
        things.append(a, deadly_func(b))

/. , .

In addition, you can reduce the size of your input to the first 1000 for debugging by adding:

for a, b, in zip(listA, listB)[:1000]:
....
+1
source

Source: https://habr.com/ru/post/1686598/


All Articles