Python Sort on a fly

I am thinking of a problem that I have not encountered before, and I am trying to determine the most efficient algorithm to use.

I repeat two lists, using each pair of elements to calculate the value I want to sort. My ultimate goal is to get the twenty best results. I could store the results in the third list, sort this list by absolute value and just slice the top twenty, but this is not ideal.

Since these lists can become extremely large, I would ideally want to store only the twenty best absolute values, digitizing the old values ​​when a new upper value is computed.

What would be the most efficient way to implement this in python?

+6
source share
4 answers

Take a look at heapq.nlargest :

heapq. nlargest ( n, iterable[, key] )

Returns a list with n largest elements from the dataset defined iterable. key, if provided, sets the function of one argument, which is used to extract the comparison key from each element in the iterable: key=str.lower Equivalent: sorted(iterable, key=key, reverse=True)[:n]

+11
source

You can use izip to iterate over two lists in parallel and build a generator for lazy computation over them, and then heapq.nlargest to efficiently store the top n :

 from itertools import izip import heapq list_a = [1, 2, 3] list_b = [3, 4, 7] vals = (abs(a - b) for a, b in izip(list_a, list_b)) print heapq.nlargest(2, vals) 
+4
source

You have a list of 20 options, initialized with a minimal calculation result and two -1 indices. When calculating the result, add it to the list of results with the indices of the pair, sort by value and crop the list to length 20. It should be reasonably efficient, since you only sort the list by length 21.

+1
source

I know that the best answer has already been chosen, but for educational purposes you can also consider mine.

I hope there are no typos:

 def some_name(list_a, list_b): if len(list_a) != len(list_b): raise Exception("Too bad") result_list = [] for result in (list_a[i] + list_b[i] for i in range(len(list_a))): if len(result_list) >= 20: if result_list[0] > result: continue result_list = result_list[1:] result_list.append(result) result_list.sort() 

And after some refactoring - it does almost what heapq.nlargest will do (here we must save the results, sorted by themselves):

 def some_name(list_a, list_b): if len(list_a) != len(list_b): raise Exception("Too bad") result_list = [] for result in (list_a[i] + list_b[i] for i in range(len(list_a))): result_list.append(result) result_list.sort() result_list = result_list[-20:] 
+1
source

Source: https://habr.com/ru/post/948072/


All Articles