In my experience, 3.4, where performance basically peaked; 4.2 is actually slower than 3.4 in my project, and 4.3 is the first, roughly equal to 3.4 performance. 4.4 is slightly faster than 3.4.
There are several cases where I found that older versions of gcc did some incredibly backward things in the code - there was a certain function that went from 128 to 21 measures from 3.4 to 4.3, but that was obviously special (it was a short loop, in which adding a few extra instructions greatly degraded performance).
3.4 , , . , , , ; --march core2 gcc segfaults , , autovectorized , .
, ; 3-5% - , .
, C; ++ .