I have no experience with the Intel compiler, so I canβt answer if you have any flags or not.
However, from what I remember, the latest versions of gcc are generally well suited for optimizing code like icc (sometimes better, sometimes worse (although most sources seem to be the best ones)), so you might run into a situation where icc is especially bad. Examples of what optimizations each compiler can perform can be found here and here . Even if gcc is usually not better, you can just have a case that gcc recognizes for optimization and icc does not. Compilers can be very picky about what they optimize and what not, especially regarding things like autovectorization.
If your loop is small enough, it might be worth comparing the generated assembly code between gcc and icc. Also, if you show some kind of code or at least tell us what you are doing in your cycle, we could give you better assumptions about what leads to this behavior. For example, in some situations. If this is a relatively small loop, this is most likely the case when icc misses one (or some, but probably not so many) optimizations that either have inherently good potential (prefetching, auto-wrapping, spread, cyclic invariant motion, ...) or which allow others to optimize (primarily investment).
Note that I'm only talking about optimization when comparing gcc with icc. In the end, icc can usually generate faster code than gcc, but not so much because it does more optimizations, but because it has a faster standard library implementation and because itβs smarter about where to optimize (at high levels gcc gets a little optimization (or at least earlier) about the size of the trading code for (theoretical) runtime improvements, which can actually hurt performance, for example, when a carefully deployed and vectorized loop executes with only three iterations.
source share