Concurrent Merge Sort with threads / many / slower than Seq. Merge sort. Help

http://pastebin.com/YMS4ehRj

^ This is my parallel merge sort implementation. Basically, what I do is, for each partition the first half is processed by the stream, while the second half is sequential (ie), We have an array of 9 elements, [0..4] is processed by Thread 1, [0..1 ] Thread 2, [5..6] is processed by thread 3 (look at the source code for explanation).

Everything else remains the same as Unification. But the problem is that it works much slower than merge sorting, even slower than regular bubble sorting! And I mean an array of 25000 int. I'm not sure where the bottleneck is: is this a mutex lock? Is this a merger?

Any ideas on how to make this faster?

-one
source share
3 answers

You create a large number of threads, each of which then only works very little. To sort 25,000 ints, you create about 12,500 threads that spawn other threads and combine their results, and about 12,500 threads that sort only two ints.

The overhead of creating all of these threads far outweighs the benefits of parallel processing.

To avoid this, make sure that each thread has enough work. For example, if one thread discovers that it needs to sort <10,000 numbers, it can simply sort them by itself with the usual merge sort, instead of creating new threads.

+2
source

Given that you have a finite number of cores on your system, why do you want to create more threads than cores?

It is also unclear why you need to have a mutex at all. As far as I can tell from the quick scan, the program does not need to share the [lthreadcnt] threads outside the local function. Just use a local variable and you have to be gold.

+1
source

Your parallelism is too fine-grained, too many threads that do a little work. You can define a threshold so that arrays that are smaller than the threshold are sorted sequentially. Be careful with the number of threads generated, a good indicator is that the number of threads is usually not much more than the number of cores.

Since most of your calculations are in the merge function, another suggestion uses Divide-and-Conquer instead of just merging. Twice the advantage: the runtime is shorter and it is easy to create threads for parallel merging. You can get an idea of ​​how to implement parallel merging here: http://drdobbs.com/high-performance-computing/229204454 . They also have an article on merge parallel sorting that may be useful to you: http://drdobbs.com/high-performance-computing/229400239

+1
source

Source: https://habr.com/ru/post/1396734/


All Articles