Parallel Sorting in Java

Question

Parallel Sorting in Java

I am currently working on a program to sort strings at the same time. My program takes a file, reads each line of the file into an array, and splits the string array into smaller string arrays. The program then starts one thread for each of the smaller arrays and quickly sorts them. As soon as each thread has finished sorting its array, the main thread collects all the results from the stream objects. It is then proposed to combine the smaller, now sorted arrays into one large sorted array.

I know that my quicksort implementation works - using a single thread, the program sorts words. I need an algorithm to combine arrays returned by threads.

Any help is appreciated - thanks in advance.

+4

java sorting concurrency

Henrik Hillestad Løvold May 06 '13 at 12:02

source share

4 answers

I think using merge sort is pretty standard.

I suggest using as many threads as you have a CPU.

You may find that reading a file is a large percentage of the time, so something that can sort the lines when you read them may be faster.

eg. sorting radix with TreeSets can be faster since it will be sorted by the time you read the file.

+1

Peter Lawrey May 06 '13 at 12:04

source share

Here you can use the merge procedure. The algorithm is pretty simple, see Combine Sorting on Wikipedia. Usage can use simple two-way merge when combining two arrays or multiple merging while combining several arrays at the same time.

Also check out this work: Parallel QuickSort and RadixSort at optimal speed .

Finally, there is also a 3-way string quicksort that can be matched.

+1

tcb May 06 '13 at 12:20

source share

As mentioned in another post, the last step in your algorithm is the union.

However, quicksort itself is a recursive algorithm and allows the natural introduction of concurrency, so your "merge step" is deprecated, see for example http://ricardozuasti.com/2012/java-concurrency-examples-forkjoin-framework/

After the rotation element is in the final position, you call quick sorting into two sections. This can be done at the same time. Since this is recursive, it will span other threads.

+1

Christian fries May 06 '13 at 12:35

source share

Stefano sanfilippo · Accepted Answer · 2013-05-06T12:17:46+0000

Start with the final merge mergesort procedure. You read the first value of each of your m-arrays (minimum one subarray), then you select the minimum m read values (global minimum), insert it into the result and delete it from the containing array or increase the corresponding index by one. Then, iterate until all subarrays are empty or all indexes reach the end of the corresponding arrays.

NOTE. This can reduce memory usage if you have a really large data set (it is actually used to handle such situations), but may work worse than the original Quicksort, due to the cost (which becomes linear if you copy by subarray) and multi-threaded overhead data. Note that in a place where Mergesort is more economical in area when applied to large arrays. Also think that whoever wrote Quicksort, you are probably using the time spent optimizing calls and branch execution.

This is a basic theoretical CS, but note that you cannot lower the class of computational complexity simply using parallelism, you will only get linear acceleration. Finally, Quicksort falls into the lower limit of medium complexity for comparison sorting algorithms: if you are trying to outperform Quicksort O(nlog(n)) , I have bad news for you.

Parallel Sorting in Java

More articles: