Xeon Phi processor versus Xeon Phi processor?

What is the difference between a host processor and a coprocessor? In particular, the Xeon Phi coprocessor and the Xeon Phi host processor?

I have some results of working on these machines (running the parallel code of the OpenMP diffusion equation code), which shows that the host processor is much faster when the same number of threads are running. I would like to know the differences and relate them to my results.

+5
source share
2 answers

To redo what Jeff said in the comments, you have a Xeon host with an Xeon Phi coprocessor connected. The current generation of Xeon Phi (Knight Corner) is only available as a coprocessor, and not as a separate Xeon Phi host (which should be available for the next generation with Knight Landing).

When you run your program without unloading from your Xeon host, it seems from this website that you can work with up to 16 threads. Please note that the speed of each of your cores is about 2.2 GHz.

When you run your program in native runtime on your Xeon Phi coprocessor, you can work with a lot more threads. The optimal number of threads to use depends on the Xeon Phi model you have (some work best with 56, others with 60). But note that each Xeon Phi core (approximately 1.2 GHz) is noticeably weaker than one Xeon core (approximately 2.2 GHz). The advantage of Xeon Phi multi-core technology is that you can run multiple cores.

The last very important thing to keep in mind is that the Xeon Phi has a SIMD instruction set for 512 bits. Thus, on the Xeon Phi coprocessor, you can support much better vector SIMD identification than on the host. In your case, I believe that your Xeon host has only a 256-bit SIMD image processing unit. Therefore, if you have not already done so, you can improve your performance (up to x16 if you are dealing with a single point) on your Xeon Phi, taking advantage of SIMD vectorization. Your Xeon host will only give up x8 performance. In order to run you on google trek, OpenMP 4.0 allows you to write things like #pragma omp simd to tell the compiler when to vectorize lower level loops throughout your code. If you really want to get the most out of Xeon Phi, adding SIMD vectorization is a must.

So, to directly answer your question: comparing the performance results between the Xeon host and the Xeon Phi coprocessor using the same number of cores is useless. We already know that every Xeon Phi core is slower than every Xeon core. You should compare the results using the maximum number of cores each of which allows (60 and 16, respectively), and with the maximum advantage of the vector processing block if you want a direct comparison.

+5
source

If you are talking about the current generation (KNC) and not the next (KNL), these are definitions.

Host processor: Xeon ~ 8 core / ~ 16 Xeon, which hosts the coprocessor, that is, the Xeon host from which the coprocessor is connected via the PCIe bus.

Co-processor: a ~ 60 core / ~ 240 thread coprocessor that hangs on the Xeon host on the Xeon PCIe bus.

The host farm disables highly parallel / vectorized jobs for the coprocessor using either unload instructions or starting them initially using some distributed programming paradigm, such as MPI.

As for the commentary on the next-generation host processor, the commenter refers to the fact that the next generation of Xeon Phi (KNL) can be configured either as a coprocessor hanging on the PCIe bus (for example, the 1st Xeon Phi gene, KNC) or as usual the processor that you connect to the motherboard.

+1
source

Source: https://habr.com/ru/post/1234658/


All Articles