Hyperstream simulation doubles execution time

I am using a simulation written in python / numpy / cython. Since I need to average in many simulation experiments, I use the multiprocessing module to start all individual batch modeling sessions.

In the office, I have an i7-920 workstation with HT. At my place without i5-560. I thought I could run twice as many instances of the simulation in each batch in the office and cut the time in half. Surprisingly, the operating time of each individual instance has doubled compared to the time it takes on my home workstation. The fact that he, performing 3 copies of the simulation in parallel at home, will take, say, 8 minutes, when you start 6 copies in the office, it will take about 15 minutes. Using 'cat / proc / cpuinfo', I confirmed 'siblings' = 8 and 'cpu cores' = 4, so HT is enabled.

I don’t know about any “preserving the full execution time” (although from a scientific point of view it can be quite interesting :)), and jumping here can shed light on this riddle.

+4
source share
4 answers

Perhaps context switches provide more overhead caused by 6 mass-computing processes and only 4 real cores. If processes compete for cpu-ress sources, they can use inefficient CPU caches.

If you use only 4 instead of 6 core, what is the result?

+3
source

Hyper-threading can be good for some types of workload. Intensive numerical computation is not one of them - if you want to crunch a number, you'd better turn off hyper-threading. What hyperthreading does provide one thing - it's a “free context switch” between tasks, but the processor has only so many execution units.

In this case, this can make the situation worse, because the OS cannot know which processes are running on individual kernels (where they will get full performance) and which are on the same kernel, only on different "hyperthreads".

(In fact, I would promise that the Linux kernel can provide control over this, but the Python multiprocessing module will start additional processes that will use the default resource allocation).

Bottomline: Turn off HT if you can - at least you fully utilize 4 cores.

+5
source

Others pretty much gave you an idea of ​​the problem, I just want to contribute by linking this article, which explains a little more about how HT works and what are the performance implications of a multi-threaded program: http://software.intel.com/en -us / articles / performance-insights-to-intel-hyper-threading-technology /

+1
source

with my HP workstation (16 cores / processor, using hyper-threads reaches 32 processors), turning hyper-thread to even broken python when running numerical simulation, error code - 0x000005 it puzzled me for a long time until I turned off the XT, and the simulation works well! maybe you can check and compare the runtime for both HT: on and off

0
source

Source: https://habr.com/ru/post/1385122/


All Articles