parallelizing two nested for-loops, I came across a behavior that I cannot explain. I tried three different types of parallelization using OpenMP on the i7 860 and xeon E5540, and I expected the code to behave more or less the same on both platforms, which means that one of the platforms should be faster for all three different cases. I tested . But this is not so:
- For case 1, Xeon is faster by ~ 10%,
- for case 2, i7 is 2 times faster and
- for case 3, Xeon is again faster by a factor of 1.5
Do you have any idea what might cause this?
Please indicate when you need more information or clarification!
To clarify, my question is more general. If I run the same code on i7 and on the xeon system, should I not use the OpenMP result in comparable (proportional) results?
pseudo code:
for 1:4
for 1:1000
vector_multiplication
end
end
Cases:
case 1: no pramga omp no parallelzation
case 2: pragma omp for the first for the loop
case 3: pragma omp for the second cycle of the cycle
results
Here are the actual numbers from the team time:
case 1
Time Xeon i7
real 11m14.120s 12m53.679s
user 11m14.030s 12m46.220s
sys 0m0.080s 0m0.176s
case 2
Time Xeon i7
real 8m57.144s 4m37.859s
user 71m10.530s 29m07.797s
sys 0m0.300s 0m00.128s
case 3
Time Xeon i7
real 2m00.234s 3m35.866s
user 11m52.870s 22m10.799s
sys 0m00.170s 0m00.136s
[Update]
Thanks for all the tips. I am still studying what could be the cause.
source
share