The worst case for QuickSort - when can this happen?

In QS analysis, everyone always refers to the “almost sorted” worst case. When is such a natural-entry scenario possible?

The only example I came up with is reindexing.

+41
algorithm quicksort
Mar 10 '10 at 7:27
source share
6 answers

I think people confuse the Quicksort partition-based sorting algorithm and the "qsort" various library implementations.

I prefer the Quicksort algorithm to use a selectable selection algorithm, which is very important for analyzing its behavior.

If the first item is always selected as the reference item, then an already sorted list is the worst. Often there is a high probability that the array is already / almost sorted, so this implementation is pretty bad.

Similarly, choosing the last element as the rotation axis is bad for the same reason.

Some implementations try to avoid this problem by choosing the middle element as the fulcrum. This did not affect the already / almost sorted arrays so much, but you could still construct an input that will use this predictable choice of rotation and make it work in quadratic time.

Thus, you get randomized algorithms for selecting control points, but even this does not guarantee O(N log N) .

So, other algorithms were developed that would use some information from the sequence before collecting the rod. You can, of course, scan the entire sequence and find the median and use it as a rod. This guarantees O(N log N) , but, of course, slower in practice.

So, some angles were cut off, and people developed the median 3 algorithm. Of course, later even this was available by the so-called median of 3 killers.

Thus, additional attempts are made to develop more “intelligent” control points selection algorithms that guarantee the asymptotic behavior of O(N log N) , which is still fast enough to be practical with varying degrees of success.

So, unless a specific implementation of Quicksort is specified, the question of when the worst case scenario occurs is undefined. If you use the so-called reference median selection algorithm, there is no quadratic worst case scenario.

However, most library implementations are likely to lose the O(N log N) guarantee for faster sorting in the middle case. Some of the really old implementations use the first element as a core, which is now well understood as poor and is no longer a widely used practice.

+41
Mar 10 '10 at 8:45
source share

I believe the worst case for quicksort depends on the choice of the rotation element at each step. Quicksort has the worst performance if the rotation axis can be either the smallest or the largest element in the list (for example, the first or last element of an already sorted list).

If, for example, you select the middle element of a list, the already sorted list does not have the worst execution time.

So, if you suspect that your script is likely to have a bad script for quick sorting, you can simply change your control selection to make quicksort better.

Note. I know that this did not give more real-life examples for the quicksort worst cases. Examples of this depend on the implementation you are working with.

+33
Mar 10 '10 at 7:45
source share

Actual question: "When can such a scenario (almost sorted) occur with natural input?"

Although all answers relate to “what leads to the worst result,” none of them cover “what leads to the fact that the data matches the worst-case performance scenario.”

So, to answer the actual question

  • Programmer's mistake: basically you get up by sorting the list twice. This usually happens because the list is sorted by one place in the code. And later in another piece of code, you know that you need a list to sort, so you sort it again.

  • Using almost chronological data: you have data that is usually taken in chronological order, but sometimes some elements are out of position. (Consider a multi-threaded environment that adds items with a timestamp to the list. Race conditions can lead to the addition of items in a different order in which they were marked by time.) In this situation, if you need sorted data, -Sort. Since the data order is not guaranteed.

  • Adding items to the list: if you have a sorted list and just add some items (i.e. without using binary insertion). You will need to sort the sorted list.

  • Data from an external source: if you receive data from an external source, there can be no guarantee that it will be sorted. Therefore, you yourself sort it. However, if the external source is sorted, you will re-sort the data.

  • Natural order: this is similar to historical data. Basically, the natural order of the data you receive can be sorted. Consider an insurance company adding car registration. If the car registration authority does this in a predictable manner, new cars will most likely not guarantee higher registration numbers. Since you are not guaranteed to be sorted, you will have to re-sort.

  • Interleaved data: if you get data from several sorted sources with overlapping keys, you can get keys similar to the following: 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 17 16 19 18. Even though half the elements do not match sequence with its neighbor, the list is "almost sorted". Of course, using QuickSort, which pivots on the first item, demonstrates O(n^2) performance.

Conclusion

Thus, given all of the above scenarios, it is actually quite easy to land on sorting almost sorted data. And that's why QuickSort, which pivots on the first item, is actually best avoided. polygene provided some interesting information on alternative rotation considerations.

As a side note: one of the usually worst-case sorting algorithms actually does a pretty good job of “almost-sorted” data. In the above data, only 9 swap operations are required to sort the bubbles. In fact, it will be O(n) .

+8
Jul 11 '14 at 17:38
source share

From quicksort

for quicksort, the “worst case” matches already sorted

A list with all the elements of the same number is already sorted.

+7
Mar 10 '10 at 7:32
source share

worst case in quicksort:

  • All elements of the array are the same.
  • The array is already sorted in the same order.
  • The array is already sorted in reverse order.
+3
May 28 '13 at 8:31
source share

The quick worst case depends on the choice of the rotation element. therefore, the problem arises only when 1) The array is already sorted in the same order. 2) The array is already sorted in reverse order. 3) All elements are the same (special case of case 1 and 2)

+1
May 13 '16 at 6:05
source share



All Articles