I would see this as a degenerate case of parallel sorting of a sample. (A parallel code for sorting samples can be found here .) Let N be the number of elements. A degenerate sample type will require a temporary space of & theta; (N) has? Theta; (N) work and & Theta; (P + lg N) span (critical path). The last two values are important for analysis, since acceleration is limited by work / range.
I assume that the input is a random access sequence. Steps:
- Select a temporary array large enough to save a copy of the input sequence.
- K-. K - . P, K = max (4 * P, L) , L . "4 * P" .
- std:: partition. . "" . , , ( ++ 11) .
- , . 3. , . . 100 , .
- . .
4 3 5, & Theta; (lg N), , .
tbb:: parallel_for 3 5, affinity_partitioner, 5 , 3.
, & Theta; (N) Theta; (N). .