Intel Compiler vs. GCC

When I compile an application with an Intel compiler, it is slower than when I compile it with GCC. Intel compiler output is more than 2 times slower. The application contains several nested loops. Are there any differences between GCC and the Intel compiler that I miss? Do I need to include some other flags to improve Intel compiler performance? I expected the Intel compiler to be at least faster than GCC.

Compiler Versions:

  Intel version 12.0.0 20101006 
  GCC version 4.4.4 20100630

The compiler flags are the same for both compilers:

-O3 -openmp -parallel -mSSE4.2 -Wall -pthread 
+6
source share
2 answers

I have no experience with the Intel compiler, so I can’t answer if you have any flags or not.

However, from what I remember, the latest versions of gcc are generally well suited for optimizing code like icc (sometimes better, sometimes worse (although most sources seem to be the best ones)), so you might run into a situation where icc is especially bad. Examples of what optimizations each compiler can perform can be found here and here . Even if gcc is usually not better, you can just have a case that gcc recognizes for optimization and icc does not. Compilers can be very picky about what they optimize and what not, especially regarding things like autovectorization.

If your loop is small enough, it might be worth comparing the generated assembly code between gcc and icc. Also, if you show some kind of code or at least tell us what you are doing in your cycle, we could give you better assumptions about what leads to this behavior. For example, in some situations. If this is a relatively small loop, this is most likely the case when icc misses one (or some, but probably not so many) optimizations that either have inherently good potential (prefetching, auto-wrapping, spread, cyclic invariant motion, ...) or which allow others to optimize (primarily investment).

Note that I'm only talking about optimization when comparing gcc with icc. In the end, icc can usually generate faster code than gcc, but not so much because it does more optimizations, but because it has a faster standard library implementation and because it’s smarter about where to optimize (at high levels gcc gets a little optimization (or at least earlier) about the size of the trading code for (theoretical) runtime improvements, which can actually hurt performance, for example, when a carefully deployed and vectorized loop executes with only three iterations.

+3
source

Usually I use -inline-level=1 -inline-forceinline to make sure that the functions that I explicitly declared inline are really embedded. Other than that, I would expect ICC performance to be at least as good as with gcc. You will need to profile your code to find out what the difference in performance depends on. If it is Linux, I recommend using Zoom , which you can get on a free 30-day trial.

+2
source

Source: https://habr.com/ru/post/902751/


All Articles