What makes the gcc std :: list sort implementation so fast?

I have a linked list and I'm experimenting with both Mergesort and QuickSort algorithms.

I do not understand why the sort operation in std :: list is so fast. Looking at std :: list under linux, it is also a linked list, not an array.

Merge Sort. I tried an almost identical version of Dave Gamble here: Merge Sort linked list

Also, I thought I would try a simple quicksort based on this code: http://www.flipcode.com/archives/Quick_Sort_On_Linked_List.shtml

Surprisingly, sorting 10 million random numbers using std :: list and sort was about 10 times faster than any of these others.

And for those who ask, yes, I need to use my own list class for this project.

+10
algorithm linux g ++ stdlist
Jul 18 '11 at 4:18
source share
1 answer

I took a look at an interesting GLibC implementation for list :: sort ( source code ), and it doesn't seem to implement the traditional merge sort algorithm (at least not the one I have ever seen).

Basically what he does:

  • Creates a series of buckets (64 in total).
  • Deletes the first element of the list for sorting and combining it with the first ( i=0 th) bucket.
  • If the bucket i th is not empty before the merge, combine the bucket i th with the bucket i+1 th.
  • Repeat step 3 until we merge with an empty bucket.
  • Repeat steps 2 and 3 until the sort list is empty.
  • Combine all the remaining non-empty buckets together, starting with the smallest and largest.

A quick note: combining bucket X with bucket Y will remove all items from bucket X and add them to bucket Y , keeping all sorted. Also note that the number of elements in the bucket is either 0 or 2^i .

Now why is this happening faster than traditional merging? Well, I can’t say for sure, but here are a few things that come to mind:

  • It never crosses the list to find the midpoint, which also makes the algorithm more convenient for caching.
  • Because earlier buckets are small and used more often, merge calls less often destroy the cache.
  • The compiler optimizes this implementation better. You will need to compare the generated assembly to be sure of this.

I'm sure that the people who implemented this algorithm have tested it thoroughly, so if you want a final answer, you probably have to request the GCC mailing list.

+12
Jul 18 2018-11-11T00:
source share
β€” -



All Articles