I know this is an old thread, but I came here with the same problem and another reason / solution.
If you run your program through Visual Studio Debugger (VS2012 here), it seems that your threads are connected to one core. To use all available kernels, I had to run it directly from the executable or from the windows cmd command prompt.
It may be very specific, but hope this helps.
source share