How can I unit test optimize performance in C?

I worked on a portable C library that processes images.

I spent quite a bit of time on a couple of low-level functions to take advantage of GCC automatic vectorization (SSE and / or AVX depending on the target processor), while preserving somewhat portable C code (extensions used: restrict and __builtin_assume_aligned ).

Now it's time to test the code on Windows (MSVC compiler). But before that, I would like to set up some kind of unit testing so as not to shoot in the leg and not lose all my carefully selected instructions in order to keep the GCC auto-injection code as it is.

I could just #ifdef/#endif whole body function, but I am thinking of a longer-term solution that would be found when updating the compilation (s) of any regression.

I am pretty confident in unit testing (there are a lot of good circuits out there), but I'm much less confident in unit testing such low-level functions. How to integrate performance testing in a CI service like jenkins for example?

PS: I would like to avoid storing hard-coded synchronization results based on a specific processor, for example:

 // start timer: gettimeofday(&t1, NULL); // call optimized function: ... // stop timer: gettimeofday(&t2, NULL); // hard code some magic number: if( t2.tv_sec - t1.tv_sec > 42 ) return EXIT_FAILURE; 
+5
source share
3 answers

Your problem basically boils down to two parts:

  • What is the best performance way to evaluate your carefully optimized code?

  • How to compare comparison results so that you can determine if code changes and / or compiler updates have affected the performance of your code.

google criteria can provide a reasonable approach to problem # 1. This is C ++, but that would not stop you from calling your C functions from it.

This library can create summary reports in various formats, including JSON and the good old CSV. You could organize their storage somewhere in one pass.

Then you can write a simple perl / python / etc script to compare test results and raise an alarm if they deviate by more than some threshold.

One thing you should be careful about is the likelihood of noise in your results caused by variables such as the load on the system performing the test. You haven't talked much about the environment in which you run the tests, but if it is (for example) a virtual machine on a host containing other virtual machines, then your test results may be distorted by what happens in other virtual machines.

CI structures, such as Jenkins, allow the script to perform the actions that must be taken when performing tests, so it is relatively easy to integrate this approach into such structures.

+1
source

One way to measure performance in a simple and repeatable way would be to run unit test benchmarking through valgrind / callgrind. This will give you a number of indicators: CPU cycles read and write transactions of instructions and data (at different depths of the cache), transactions with bus lock, etc. You will only need to check these values ​​for a well-known initial value.

Valgrind is repeated because it emulates the operation of code. This, of course, is (much) slower than direct code execution, but it makes it independent of system loading, etc.

Where Valgrind is not available, as on Windows (although valgrind + wine + Windows is mentioned on Linux), dynamoRIO is an option. It provides tools like Valgrind, like a command counter, as well as a memory and cache analyzer. (Also available on Linux and apparently partially ported to OS X at the time of this writing)

+1
source


Assuming you have good reason to use MSVC, if I were you, I would stick to a method as low as possible to reduce peripheral interference. So, even if you would prefer to use the proper structure for testing, a simple loop that calls your key functions with predefined parameters and with timers attached to the appropriate places will provide more reliable results than alternatives. If you can calculate the base average and standard deviation of the synchronization results, you will have a very clear idea of ​​what happens to the performance, where and when it becomes slower or faster than it should do, etc.

-1
source

Source: https://habr.com/ru/post/1243585/


All Articles