Parallel processes (created by joblib ) require data copying. Imagine it this way: you have two people, each of whom carries a stone in his house, shines, and then returns it. This load is slower than one person shining in place.
All the time is wasted, and not spent on the actual calculation. You will need only parallel processes for more significant computational tasks.
If you want to speed up this specific operation: Use numpy vectorized math operations. On my machine, parallel: 1.13 s, serial: 54.6 ms, numpy: 3.74 ms.
a = np.arange(100000, dtype=np.int) np.sqrt(a ** 2)
Don't worry about libraries like Cython or Numba; they will not accelerate this operation already in progress.
source share