to see if branch prediction really slows down my program or helps
Predicting a branch does not slow down programs. When people talk about the value of missed predictions, they talk about how much more expensive an incorrectly predicted branch is compared to a correctly predicted branch.
If branch prediction did not exist, all branches would be as expensive as incorrectly predicted.
So, what a “false prediction delay is 10 to 20 clock cycles” really means that successful branch prediction saves 10 to 20 cycles.
Removing branches not only improves performance during code execution, but also helps the compiler optimize code.
Why use branch prediction?
Why use branch prediction to remove branches? You should not. If the compiler can delete branches, it will (provided that optimization is enabled), and if programmers can delete branches (provided that this does not harm readability or is a critical part of the code), they should.
This hardly makes branch prediction useless. Even if you delete as many branches as possible from the program, it will still contain many, many branches. Therefore, because of this and because of how unpredictable branches are expensive, branch prediction is essential for good performance.
Is there a way to get the compiler to generate assembly code without branches?
The optimizing compiler will already remove branches from the program, if possible (without changing the semantics of the program), but if we are not talking about a very simple program int main() {return 0;} -type, it is impossible to delete all branches. For loops, branches are required ( if they don’t expand, but it only works if you know the number of iterations ahead of time), and so do most if- and switch-statements. If you can minimize the number of if s, switch es, and loops in your program, fine, but you cannot remove all of them.
or disable branch prediction to CPU? so can i compare both results?
As far as I know, it is impossible to disable branch prediction on x86 or x86-64 processors. And, as I said, this will never improve performance (although this may make it predictable, it is usually not a requirement in the contexts where these processors are used).