How big is the branch prediction buffer for a typical modern processor?

Question

How big is the branch prediction buffer for a typical modern processor?

The application I'm dealing with has a large number of if-statements with characteristics that in any one execution, only one of the branches is executed in 90% of cases.

Now I can check the effect of branch prediction on one if statement for a specific processor by doing something like this: -

#include <iostream> #include <stdlib.h> using namespace std; int main() { int a; cin>>a; srand(a); int b; long count=0; for (int i=0; i<10000; i++) { for (int j=0; j<65535; j++) { b = rand() % 30 + 1; if (b > 15) // This can be changed to get statistics for different %-ages count += (b+10); } } cout << count <<"\n"; }

My question is, is there a way to test the scalability and effect of branch prediction with multiple if statements in a real large application for a given CPU?

In principle, I want to be able to find out how many fictitious industry predictions are on different processors and their impact on the application.

+4

c ++ performance branch-prediction cpu

owagh 10 sept. '12 at 15:59

source share

1 answer

Louis ricci · Answer 1 · 2012-09-10T18:04:59+0000

You need to consider the complexity of your branches, the compiler can delete branches using architecture-specific operation codes, such as CMOV (comparison and movement).

Simple code example

 if (b > 15) count += (b+10);

Here is the code compiled into machine language

 ;; assembly x86 FASM/NASM syntax ;; WITH branching MOV ebx, [b] ;; b MOV ecx, [count] ;; count CMP ebx, 15 ;; if condition to set flags JLE .skip ;; { branch/jump over the if body when less than or equal LEA eax, [ecx + ebx + 10] ;; count + b+10 MOV [count], eax ;; store count .skip: ;; } label after the if block ;; WITHOUT branching MOV ebx, [b] ;; b MOV ecx, [count] ;; count LEA eax, [ecx + ebx + 10] ;; pre-calc avoiding the need to branch CMP ebx, 15 ;; if condition to set flags CMOVLE eax, ecx ;; make eax equal to ecx (current count) when less than or equal ;; avoiding the branch/jump MOV [count], eax ;; store count

So, if you don’t know how the optimizing compiler optimizes your code, it’s a little difficult to profile branch prediction. If you check the output of your machine code and know that you have many J [condition] statements, then it’s enough to use the code profiling tool mentioned in the comments. Trying to overturn your own branch prediction test without using the proper debug register architecture will lead to the situation I demonstrated above.

How big is the branch prediction buffer for a typical modern processor?

More articles: