There are three parts to concurrent IMO programming: identify parallelism and specify parallelism. Identify = Break the algorithm into parallel pieces of work, indicate = actual encoding / debugging performed. Identifying does not depend on what structure you will use to indicate parallelism, and I do not think that the structure can help there. He has a good understanding of your application, target platform, general parallel software compromises (hardware latencies, etc.), and most importantly - experience. Indicate, however, it is possible to discuss, and here is what I am trying to answer below:
I tried a lot of frameworks (at school and at work). Since you asked about multi-core processors, which are shared memory, I will stick to the three common memory frameworks I used.
Pthreads (not really there, but definitely applicable):
Pro: -Pthreads is extremely generic. For me, pthreads is like a parallel programming assembly. You can encode any paradigm in pthreads. “It's flexible, so you can make it as tall as you want.” There are no inherent limitations to slow you down. You can write your own constructions and primitives and get as much speed as possible.
Con: -Make sure that you do all the plumbing, like managing work queues, task allocation, on your own. The actual syntax is ugly, and your application often has a lot of extra code that makes writing code difficult and then hard to read.
OpenMP:
Pros: - Codex looks clean, plumbing and separation of tasks - mostly under the hood -Semi flexible. This gives you some interesting planning opportunities.
Cons: -Meant for a simple loop such as parallelism. (The latest Intel verion also supports tasks, but the tasks are the same as Cilk.) - Other structures may or may not be well written to execute. The GNU implementation is fine. Intel ICC worked better for me, but I would rather write some things to improve performance.
Cilk, Intel TBB, Apple GCD:
Pros: -Positively optimal basic algorithms for the parallelism task level -Special management of serial / parallel tasks -TBB also has a parallelism pipeline infrastructure that I used (this is not the best to be frank) -Installs the task of writing a lot of code for task-based systems which can be a big plus if you briefly
Cons: -Reduced performance control of basic structures. I know that Intel TBB has very poorly performing basic data structures, for example, the work queue was bad (in 2008, when I saw it). -Code looks awful sometimes with all the keywords and keywords that they want to use -Reads a lot of links in more detail to understand their "flexible" APIs.