Explicit parallelism code in C ++

Question

Explicit parallelism code in C ++

Failure to fulfill an order in the CPU means that the CPU can reorder the instructions in order to get better performance, which means that the CPU must do very beautiful accounting reports and the like. There are other processor approaches, such as hyperthreading.

Some bizarre compilers understand the (un) relationship of instructions to a limited extent and automatically interleave the flow of commands (perhaps a longer window than the processor sees) to make better use of the processor. Another example of this is intentional compilation with a temporary alternation of floating and whole instructions.

Now I have a highly parallel task. And I usually have an aging x86 single core processor without hyperthreading.

Is there a direct way to get my body of my "for" loop so that this highly parallel task rotates so that two (or more) iterations work together? (This is a little different from loop unwinding, as I understand it.)

My task is a "virtual machine" going through a set of instructions that I really simplify to illustrate how:

void run (int num) {
  for (int n = 0; n <num; n ++) {
     vm_t data (n);
     for (int i = 0; i <data.len (); i ++) {
        data.insn (i) .parse ();
        data.insn (i) .eval ();
     }
  }  
}

Thus, the execution trace may look like this:

data (1) insn (0) parse
data (1) insn (0) eval
data (1) insn (1) parse
...
data (2) insn (1) eval
data (2) insn (2) parse
data (2) insn (2) eval

Now, I would like to be able to do two (or more) iterations explicitly in parallel:

data (1) insn (0) parse
data(2) insn(0) parse  \ processor can do OOO as these two flow in
data(1) insn(0) eval   /
data(2) insn(0) eval   \ OOO opportunity here too
data(1) insn(1) parse  /
data(2) insn(1) parse

, (, Callgrind -simulate-cache = yes) ( ), eval - . - . , , , , - , ...

- ++ parallelism?

, - - , . , ! , , - ?

+3

c++ performance design-patterns

Will 26 . '08 21:58

8

, , .

+5

tzot 27 . '08 0:05

, OpenMP. "" , , .

+4

Adrian Mouat 26 . '08 22:06

Hyperthreading - , . , , . .

( , ), OpenMP Intel .

TBB - , ++. OpenMP - , , . GCC/g++ 4.2 . Intel Microsoft . .

EDIT: . , TBB OpenMP, , , 100 , 50/50 , 25/25/25/25 ..

+3

Branan 26 . '08 22:09

, Core 2, 100 ; , .

, , , .

+2

Dark Shikari 26 . '08 22:10

++. , .

, . ? , , , , . , , , .

, , . - , , ?

+2

David Thornley 26 . '08 22:13

cilk. ANSI C, C. , C, .

0

Adam Rosenfield 29 . '08 22:28

, " x86 ". , , , / .

, OpenMP - , . , OpenMP , DIY (Do It Youself). , OpenMP , , , - , .

, , , , parallelism, - , , - , : .

DYI OpenMP, :

. , -
JobItem .
, JobItems
, , . , , , . , , reset + () .
an experiment with various JobItem planning strategies. If you have enough long loops, it is better if each thread grabs several consecutive JobItems at a time. This reduces synchronization overhead and at the same time makes threads more cache friendly. You can also do this dynamically, reducing the length of the planned sequence as you run out of tasks, or letting individual threads steal items from other thread schedules.

0

Suma Oct 6 '08 at 12:44

source share

Commodore Jaeger · Accepted Answer · 2008-09-26T22:47:13+0000

, , : , , . ( , .)

. , . , , , . , , , , .

( , , , , , . , .)

Explicit parallelism code in C ++

More articles: