How does a machine with a higher processor performance (according to gprof) have worse real-time performance?

Background

I have an intensive computing program that I am trying to run on a single node supercomputer. Here are the specifications for one of the supercomputer nodes:

  • OS: Redhat 6 Enterprise 64-bit
  • Processor: Intel 2x 6-core 2.8 GHz (12 cores) - 12 MB Cache
  • RAM: 24 GB @ ???? MHz (no sudo access to check dmidecode)

I also tested this program on an Ubuntu virtual machine running on my MacBook:

  • OS: Ubuntu 13.10 64-bit
  • CPU: Intel 4x 2.30GHz (4 cores) - Cache 6MB
  • RAM: 3 GB at 1600 MHz

The program is built with the same version of gcc on both machines. However, for a simplified test run of the program, the real time to run the program on a supercomputer is much longer than on my virtual machine.

This made no sense to me, and to make it more confusing when I run gprofin my program, it shows that the supercomputer is really faster than my virtual machine. The table below shows the different times that I see for my program on each machine (SC = supercomputer, VM = virtual machine):

                            | SC   | VM    |
|---------------------------|------|-------|
| Release (-O3) Real Time   | 15s  | 3s    |
| Debug (-g -pg) Real Time  | 55s  | 35s   |
| Debug (-g -pg) gprof Time | 6.10 | 9.24s |

This happens regardless of how many times I test it, and in the case of a supercomputer, I am the only user in the node calculation when the program is running (that is, it should not contradict other processes).

-. 1.4MB 82 .

, , gprof , ? ?

. - openmpi, , .

, , Matrix Market (690 - "" 2 ) , . ( 48 ) , ( 74 ). , , , I/O, I/O.

CPU

SC

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 0
siblings    : 6
core id     : 0
cpu cores   : 6
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5600.41
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 0
siblings    : 6
core id     : 1
cpu cores   : 6
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5600.41
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 0
siblings    : 6
core id     : 2
cpu cores   : 6
apicid      : 4
initial apicid  : 4
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5600.41
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 0
siblings    : 6
core id     : 8
cpu cores   : 6
apicid      : 16
initial apicid  : 16
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5600.41
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 4
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 0
siblings    : 6
core id     : 9
cpu cores   : 6
apicid      : 18
initial apicid  : 18
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5600.41
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 5
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 0
siblings    : 6
core id     : 10
cpu cores   : 6
apicid      : 20
initial apicid  : 20
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5600.41
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 6
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 1
siblings    : 6
core id     : 0
cpu cores   : 6
apicid      : 32
initial apicid  : 32
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5599.85
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 7
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 1
siblings    : 6
core id     : 1
cpu cores   : 6
apicid      : 34
initial apicid  : 34
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5599.85
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 8
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 1
siblings    : 6
core id     : 2
cpu cores   : 6
apicid      : 36
initial apicid  : 36
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5599.85
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 9
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 1
siblings    : 6
core id     : 8
cpu cores   : 6
apicid      : 48
initial apicid  : 48
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5599.85
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 10
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 1
siblings    : 6
core id     : 9
cpu cores   : 6
apicid      : 50
initial apicid  : 50
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5599.85
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 11
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping    : 2
cpu MHz     : 2800.207
cache size  : 12288 KB
physical id : 1
siblings    : 6
core id     : 10
cpu cores   : 6
apicid      : 52
initial apicid  : 52
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5599.85
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

VM

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 58
model name  : Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
stepping    : 9
microcode   : 0x15
cpu MHz     : 2294.125
cache size  : 6144 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm fsgsbase smep
bogomips    : 4588.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 58
model name  : Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
stepping    : 9
microcode   : 0x15
cpu MHz     : 2294.125
cache size  : 6144 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm fsgsbase smep
bogomips    : 4588.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 58
model name  : Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
stepping    : 9
microcode   : 0x15
cpu MHz     : 2294.125
cache size  : 6144 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm fsgsbase smep
bogomips    : 4588.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 58
model name  : Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
stepping    : 9
microcode   : 0x15
cpu MHz     : 2294.125
cache size  : 6144 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm fsgsbase smep
bogomips    : 4588.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
+4
2

, , , . , , . Westmere ccNUMA (- ). NUMA , , - . , , , . MacBook NUMA.

, ( ) ccNUMA. ( ). NUMA node , . - , . . , , L1 L2, L1, L2 L3.

taskset numactl. , . OpenMP 4.0 , OpenMP ( 3.1), . GCC/libgomp , GOMP_CPU_AFFINITY. 12- node :

GOMP_CPU_AFFINITY="0-11" ./executable

OpenMP , , . . VM node. OMP_NUM_THREADS OpenMP, .

- :

GOMP_CPU_AFFINITY="0-3" OMP_NUM_THREADS=4 ./executable

. NUMA OpenMP. L3 ( Ivy Bridge L3, Sandy Bridge), Ivy Bridge, (, X5660 TurboBoost?) , , solver, @amdn.

+1

3. , -, , AVX. MacBook , - , , AVX. , 33 , 690 15 , MacBook 71 , 690 3 . 48 74 MacBook.

2: , , , MacBook . 690 , , 1,4 , . MacBook . X5660 Xeon . , , , (3 2 ). , , MacBook , , MacBook AVX, .

, , CPU. "" , , , Intel® Advanced Vector Extensions (Intel® AVX), MacBook. AVX

Intel Engineers

SSE AVX Sandy Bridge Ivy Bridge, ~ 3x ~ 6x ( sqrt), (C/++) Intel ++ 13.0.0.089 ( ).

AVX. L1, , , MKL.

http://ark.intel.com/compare/47921,64900

enter image description here

+3

Source: https://habr.com/ru/post/1533617/


All Articles