Although there may be a universal compiler that is better than another ... I think that it usually comes down to your specific application. Take a small portion of your code, the kernel loop you want to be fast .. do a simple 10-line test around it ... and try your loop.
source
share