The fastest way to sort multiple lists - Python

I have two lists, x and y, and I want to sort x and rearrange y by rearranging x-sort. For example, given

x = [4, 2, 1, 3] y = [40, 200, 1, 30] 

I want to receive

 x_sorted = [1,2,3,4] y_sorted = [1, 200, 30, 40] 

As discussed in past questions, an easy way to solve this is

 x_sorted, y_sorted = zip(*sorted(zip(x,y))) 

Here is my question: what is the fastest way to do this?


I have three methods for completing a task.

 import numpy as np x = np.random.random(1000) y = np.random.random(1000) 

method 1:

 x_sorted, y_sorted = zip(*sorted(zip(x,y))) #1.08 ms 

method 2:

 foo = zip(x,y) foo.sort() zip(*foo) #1.05 ms 

method 3;

 ind = range(1000) ind.sort(key=lambda i:x[i]) x_sorted = [x[i] for i in ind] y_sorted = [y[i] for i in ind] #934us 

Is there a better method that runs faster than the three methods?


Additional questions.

  • Why is method 2 not faster than method 1, although it uses a sort method?
  • If I execute method 2 separately, it runs faster. In IPython terminal

I have

 %timeit foo = zip(x,y) #1000 loops, best of 3: 220 us per loop %timeit foo.sort() #10000 loops, best of 3: 78.9 us per loop %timeit zip(*foo) #10000 loops, best of 3: 73.8 us per loop 
+6
source share
3 answers

Using numpy.argsort :

 >>> import numpy as np >>> x = np.array([4,2,1,3]) >>> y = np.array([40,200,1,30]) >>> order = np.argsort(x) >>> x_sorted = x[order] >>> y_sorted = y[order] >>> x_sorted array([1, 2, 3, 4]) >>> y_sorted array([ 1, 200, 30, 40]) 

 >>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000) 0.030632019043 

Note

This makes sense if the input is already numpy arrays.

+7
source
 >>> x = [4, 2, 1, 3] >>> y = [40, 200, 1, 30] >>> x_sorted, y_sorted = zip(*sorted(zip(x, y), key=lambda a:a[0])) >>> x_sorted (1, 2, 3, 4) >>> y_sorted (1, 200, 30, 40) 

Performance:

 >>> timeit('foo = zip(x,y); foo.sort(); zip(*foo)', 'from __main__ import x, y', number=1000) 1.0197240443760691 >>> timeit('zip(*sorted(zip(x,y)))', 'from __main__ import x, y', number=1000) 1.0106219310922597 >>> timeit('ind = range(1000); ind.sort(key=lambda i:x[i]); x_sorted = [x[i] for i in ind]; y_sorteds = [y[i] for i in ind]', 'from __main__ import x, y', number=1000) 0.9043525504607857 >>> timeit('zip(*sorted(zip(x, y), key=lambda a:a[0]))', 'from __main__ import x, y', number=1000) 0.8288150863453723 

To see the full picture:

 >>> timeit('sorted(x)', 'from __main__ import x, y', number=1000) 0.40415491505723367 # just getting sorted list from x >>> timeit('x.sort()', 'from __main__ import x, y', number=1000) 0.008009909448446706 # sort x inplace 

@Falsetru method - fast for np.arrays

 >>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000) 0.05441799872323827 

As @AshwiniChaudhary suggested in the comments for lists , there is a way to speed it up using itertools.izip instead of zip :

 >>> timeit('zip(*sorted(izip(x, y), key=itemgetter(0)))', 'from __main__ import x, y;from operator import itemgetter;from itertools import izip', number=1000) 0.4265049757161705 
+4
source

You are not observing this correctly

 %timeit foo.sort() 

After the first cycle, it is already sorted for the remainder. Timsort is very effective for pre-sorted lists.

I was a little surprised that using the @Roman key function was much faster. You can improve it using itemgetter

 from operator import itemgetter ig0 = itemgetter(0) zip(*sorted(zip(x, y), key=ig0)) 

This is about 9% faster than using the lambda function for lists of 1000 elements

+4
source

Source: https://habr.com/ru/post/952171/


All Articles