Python and list performance

Suppose you have an understanding in python, for example

Values = [ f(x) for x in range( 0, 1000 ) ] 

where f is just a function with no side effects. Thus, all records can be calculated independently.

Is it possible for Python to improve the performance of this list comprehension compared to the "obvious" implementation; for example shared memory on parallel multi-core processors?

+6
source share
4 answers

In Python 3.2, they added concurrent.futures , a good library to solve problems at the same time. Consider this example:

 import math, time from concurrent import futures PRIMES = [112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419, 112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419] def is_prime(n): if n % 2 == 0: return False sqrt_n = int(math.floor(math.sqrt(n))) for i in range(3, sqrt_n + 1, 2): if n % i == 0: return False return True def bench(f): start = time.time() f() elapsed = time.time() - start print("Completed in {} seconds".format(elapsed)) def concurrent(): with futures.ProcessPoolExecutor() as executor: values = list(executor.map(is_prime, PRIMES)) def listcomp(): values = [is_prime(x) for x in PRIMES] 

Results on my quad core:

 >>> bench(listcomp) Completed in 14.463825941085815 seconds >>> bench(concurrent) Completed in 3.818351984024048 seconds 
+7
source

No, Python will not magically parallelize this for you. In fact, he cannot, because he cannot prove the independence of the records; which will require a lot of verification / verification of the program, which is impossible to do in the general case.

If you need fast coarse-grained multi-core parallelism, I recommend joblib :

 from joblib import delayed, Parallel values = Parallel(n_jobs=NUM_CPUS)(delayed(f)(x) for x in range(1000)) 

I not only witnessed almost linear accelerations using this library, but also have an excellent characteristic of signals such as one of Ctrl-C on its work processes, which cannot be said about all multiprocessor libraries.

Note that joblib does not actually support parallelism shared memory: it spawns worker processes, not threads, so it imposes some overhead on sending data to workers and the results back to the main process.

+8
source

I would prefer map() in this scenario:

 % python -m timeit ' f = lambda x: 2 * x Values = [ f(x) for x in range( 0, 1000 ) ] ' 1000 loops, best of 3: 263 usec per loop 

There is no real increase in performance:

 % python -m timeit ' f = lambda x: 2 * x Values = map(f, range( 0, 1000 )) ' 1000 loops, best of 3: 255 usec per loop 
+1
source

Try it if the following could be faster:

 Values = map(f,range(0,1000)) 

This is a functional coding method.

Another idea is to replace all occurrences of Value in code with an expression of expression

 imap(f,range(0,1000)) # Python < 3 map(f,range(0,1000)) # Python 3 
+1
source

Source: https://habr.com/ru/post/889618/


All Articles