What is a quick and Putin / clean way to remove a sorted list from another sorted list in python?

I am creating a quick way to generate a list of primes in the range (0, limit + 1). In this function, I remove all integers in the list with the name removeable from the list with the names primes. I am looking for a quick and pythonic way to remove integers, knowing that both lists are always sorted.

Maybe I'm wrong, but I think that list.remove (n) iterates through the list, comparing each element with n. that the following code works in O (n ^ 2) time.

# removable and primes are both sorted lists of integers for composite in removable: primes.remove(composite) 

Based on my assumption (which may be incorrect and please confirm whether it is correct or not), and the fact that both lists are always sorted, I think the following code works faster because it only cycles through the list for O ( n) time. However, this is not at all pythonic or clean.

 i = 0 j = 0 while i < len(primes) and j < len(removable): if primes[i] == removable[j]: primes = primes[:i] + primes[i+1:] j += 1 else: i += 1 

Perhaps there is a built-in function or an easier way to do this? And what is the fastest way?

Side notes: I have not actually timed the functions or code above. In addition, it does not matter if the list is modified / changed in the process.

For anyone interested in the full features below:

 import math # returns a list of primes in range(0, limit+1) def fastPrimeList(limit): if limit < 2: return list() sqrtLimit = int(math.ceil(math.sqrt(limit))) primes = [2] + range(3, limit+1, 2) index = 1 while primes[index] <= sqrtLimit: removable = list() index2 = index while primes[index] * primes[index2] <= limit: composite = primes[index] * primes[index2] removable.append(composite) index2 += 1 for composite in removable: primes.remove(composite) index += 1 return primes 
+5
source share
2 answers

This is pretty quick and clean, it sets O(n) checks for membership, and in amortized time it runs in O(n) (the first line O(n) amortized, the second line O(n * 1) amortized because the membership check O(1) amortized):

 removable_set = set(removable) primes = [p for p in primes if p not in removable_set] 

Here is a modification of your second solution. It performs O(n) basic operations (worst case):

 tmp = [] i = j = 0 while i < len(primes) and j < len(removable): if primes[i] < removable[j]: tmp.append(primes[i]) i += 1 elif primes[i] == removable[j]: i += 1 else: j += 1 primes[:i] = tmp del tmp 

Note that constants also matter. The Python interpreter is rather slow (i.e., with a large constant) to execute Python code. The second solution has a lot of Python code, and it can be slower for small practical values ​​of n than a solution with set s, since set operations are implemented in C, so they can be fast (that is, with a small constant).

If you have several working solutions, run them on typical input sizes and measure the time. You may be surprised at their relative speed, often this is not what you could predict.

+7
source

The most important thing here is to remove the quadratic behavior. You have this for two reasons.

First, the remove call searches the entire list of values ​​to remove. This takes linear time, and you do it once for each element in removable , so your total time is O(NM) (where N is the length of the primes and M is the length of the removable ).

Secondly, removing items from the middle of the list forces you to move the rest of the list into one slot. So, each of them takes linear time, and again you do it M times, so again it is O(NM) .


How can you avoid this?

For the first, you need to either use sorting, or just use something that allows you to search by constant time instead of linear time, such as set .

For the second, you need to create a list of indexes to delete, and then perform a second pass to move each item up the corresponding number of indexes at the same time, or simply create a new list instead of trying to change the original in place.

So, there are many options. Which one is better? It almost certainly does not matter; changing your O(NM) to O(N+M) is likely to be more than sufficient for the optimization you are pleased with the results. But if you need to squeeze more performance, you will have to implement them all and test based on realistic data.

The only one that I think is not obvious is how to "use sorting". The idea is to use the same iterations with zip steps that you would use in a merge sort, for example:

 def sorted_subtract(seq1, seq2): i1, i2 = 0, 0 while i1 < len(seq1): if seq1[i1] != seq2[i2]: i2 += 1 if i2 == len(seq2): yield from seq1[i1:] return else: yield seq1[i1] i1 += 1 
+3
source

Source: https://habr.com/ru/post/1203260/


All Articles