Unloading daemon on xeon phi 5110p

I know that the Intel Xeon Phi SE10X processor has 61 cores and it is proposed to use only 60 cores, since 1 core is used for the unload daemon. In addition, since the intel xeon phi 5110P coprocessor has 60 cores, is it proposed to use 59 cores?

+4
source share
3 answers

I rated the performance of my test code on the intel xeon phi 7120p card. I noticed that code performance is best when not. threads were a multiple (number of cores - 1). This is due to the fact that one of the cores is busy launching Linux services for the Linux OS.

Generally:

No. of threads to create >= K * T * (N-1) K = Positive integer (=2 works fine) T = No. of thread contexts on hardware(4 in my case) N = No. of cores present on hardware. 
+1
source

From this, this question related to the military-industrial complex :

Sensitive affinities

In Intel MPSS, many kernel services and daemons are tied to the Bootstrap Processor (BSP), which is the latest physical core. This is also when the unloading daemon performs the services necessary to support the transfer of data for unloading. Therefore, it is generally wise to avoid using this kernel for user code. (Indeed, as already discussed, the unloading system does this automatically, removing logical processors on the last core due to the default proximity of unloaded processes).

From this OpenMP in the MIC manual :

Unloaded programs inherit the binding map, which hides the last core, which is intended for the functions of the unloading system. Native programs can use all cores, which makes the calculations necessary for balancing threads somewhat different.

None of these sources apply to any MIC model; they relate to architecture; therefore, it seems that if you unload onto the device and do not use the default attachment, you really should avoid the last kernel.

+3
source

When you execute your workload in unload mode (when the application starts on the CPU and unloads some calculations in Xeon Phi), it is recommended to leave 1 core for the unload time. On the Xeon Phi side, there is a COI daemon that manages four service flows to manage unloading activities. Keep in mind that 1 physical core on Xeon Phi runs 4 hardware threads. In the case of your own execution model, when the application runs directly on the Xeon Phi board, you can use all available kernels. Since there is currently some unloading activity.

+1
source

Source: https://habr.com/ru/post/1485365/


All Articles