Strange slow fast sorting of large tables

Question

Strange slow fast sorting of large tables

I did my homework to compare a bunch of sorting algorithms, and I came across a strange phenomenon. Everything was as expected: insertionsort wins for something like a table of 20 ints, otherwise quicksort is superior to heapsort and mergesort. Up to the table 500 000 ints (stored in memory). For 5,000,000 ints (still in memory), quicksort becomes suddenly worse than heapsort and mergesort. Numbers are always evenly distributed randomly; Windows virtual memory is disabled. Does anyone have an idea what could be causing this?

void quicksortit(T *tab,int s) { if (s==0 || s==1) return; T tmp; if (s==2) { if (tab[0]>tab[1]) { tmp=tab[0]; tab[0]=tab[1]; tab[1]=tmp; } return; } T pivot=tab[s-1]; T *f1,*f2; f1=f2=tab; for(int i=0;i<s;i++) if (*f2>pivot) f2++; else { tmp=*f1; *f1=*f2; *f2=tmp; f1++; f2++; } quicksortit(tab,(f1-1)-tab); quicksortit(f1,f2-f1); };

+5

c ++ quicksort large-data

Domin Nov 03 '14 at 20:56

source share

2 answers

Perhaps your array is now larger than the L3 cache.

The quick sort operation moves random elements from one end of the array to the other. A typical Intel L3 cache is similar to 8MB. With 5M 4-byte elements - your array is 20 MB. and you write from one end to the other.

The cache skips L3, goes into main memory, and can be much slower than misses in a higher level cache.

So far, your entire sorting operation has fully worked inside the CPU.

+1

Rafael baptista Nov 03 '14 at 21:12

source share

2501 · Accepted Answer · 2014-11-03T21:54:07+0000

The algorithm starts with an error when there are a lot of duplicates in the array. You noticed this only with large values, because you feed the algorithm with random values that have a large range (I assume you used rand () with: 0 - RAND_MAX), and this problem only occurs with large arrays.

When you try to sort an array of identical numbers (try sorting 100,000 identical numbers, the program crashes), you will first go through the entire array, changing the elements too much. Then you split the array into two, but the large array is reduced by only 1:

  v quicksortit(tab,(f1-1)-tab);

Thus, your algorithm becomes O (n ^ 2), and you also consume a very large amount of stack. Finding the best rod will not help you in this case, rather choose a version of quicksort () that does not exhibit this drawback.

For instance:

 function quicksort(array) if length(array) > 1 pivot := select middle, or a median of first, last and middle left := first index of array right := last index of array while left <= right while array[left] < pivot left := left + 1 while array[right] > pivot right := right - 1 if left <= right swap array[left] with array[right] left := left + 1 right := right - 1 quicksort(array from first index to right) quicksort(array from left to last index)

This is a modified version: http://rosettacode.org/wiki/Sorting_algorithms/Quicksort

Strange slow fast sorting of large tables

More articles: