Processor Clock Measurement

I wrote a program in C. His program was created as a result of research. I want to calculate the exact processor cycles that the program consumes. The exact number of cycles. Any idea how I can find this?

+4
source share
5 answers

The valgrind cachegrind tool ( valgrind --tool=cachegrind ) will provide you with verbose output, including the number of commands executed, cache skips, and branch skips. They can be assigned to separate lines of assembler, so in principle (with knowledge of your exact architecture) you can get the exact sum of cycles from this output.

Know that it will change from execution to execution due to cache effects.

The documentation for the cachegrind tool is here .

+11
source

No, you can’t. The concept of a "CPU cycle" is not defined. Modern chips can work with several clock frequencies, and different parts can do different things at different times.

The question is how many stages of the pipeline can be significant in some cases, but there is hardly any way to get it.

+1
source

Try OProfile . It uses various hardware counters for the CPU to measure the number of instructions executed and the number of cycles. You can see an example of use in the article Memory Part 7: Memory Performance Tools .

+1
source

I'm not quite sure that I know exactly what you are trying to do, but what can be done on modern x86 processors is to read the time (TSC) before and after the code block that interests you. At the assembly level, this is done using the RDTSC instruction, which gives you the TSC value in edx:eax register the pair.

Note, however, that there are certain warnings for this approach, for example. if your process starts with CPU0 and ends with CPU1, the result that you get from RDTSC will refer to the specific processor core that executed the instruction and, therefore, may be incompatible. (There is also no serialization of instructions with RDTSC , but in this context, I don’t think there is so much problem here.)

+1
source

Sorry, but no, at least not for most practical purposes - this is simply not possible with most conventional OS. For example, quite a few operating systems do not use the full context switch to handle the interrupt, so the time spent servicing the interrupt can and will often be displayed in time, regardless of what process was running when the interrupt occurred.

"Not for practical purposes" would mean the ability to run your program under the exact loop simulator. They are available, but mainly for processors, used mainly in real-time embedded systems, and not for something like a full-blown PC. Worse, they (usually) do not run anything like a full-blown OS, but for code that runs on bare metal.

In theory, you can do something with a virtual machine running on Windows or Linux, but I don’t know about any existing virtual machine that is trying, and that would be clearly non-trivial and probably have quite serious performance consequences (to put it mildly).

0
source

Source: https://habr.com/ru/post/1306173/


All Articles