Performance degradation for large C ++ dlls with auto-generated C code

I am working on software that needs to be called a family of optimization solvers. Each solver is an automatically generated piece of C code, with thousands of lines of code. I use 200 of these solvers, differing only in the size of the optimization problem that needs to be solved.

All-in-one, these auto-generated solvers are approximately 180 MB of C code, which I compile in C ++ using the extern "C"{ /*200 solvers' headers*/ } syntax extern "C"{ /*200 solvers' headers*/ } in Visual Studio 2008. Compiling all of this is very slowly (using "maximum speed / O 2", it takes about 8 hours). For this reason, I thought it would be nice to compile the solvers into one DLL, which I can then call from separate software (which will have a reasonable compilation time and allow me to abstract this whole extern) C "from a higher level code.) Compiled The DLL then is about 37 MB.

The problem is that when executing one of these solvers using DLLs, the execution takes about 30 ms. If I were to compile only the one that solves in the DLL and calls it from the same program, the execution is about 100 times faster (<1ms). Why is this? Can I get around this?

The DLL is as follows. Each solver uses the same structures (i.e., they have the same member variables), but they have different names, therefore, all types of casting.

 extern "C"{ #include "../Generated/include/optim_001.h" #include "../Generated/include/optim_002.h" /*etc.*/ #include "../Generated/include/optim_200.h" } namespace InterceptionTrajectorySolver { __declspec(dllexport) InterceptionTrajectoryExitFlag SolveIntercept(unsigned numSteps, InputParams params, double* optimSoln, OutputInfo* infoOut) { int exitFlag; switch(numSteps) { case 1: exitFlag = optim_001_solve((optim_001_params*) &params, (optim_001_output*) optimSoln, (optim_001_info*) &infoOut); break; case 2: exitFlag = optim_002_solve((optim_002_params*) &params, (optim_002_output*) optimSoln, (optim_002_info*) &infoOut); break; /* ... etc. ... */ case 200: exitFlag = optim_200_solve((optim_200_params*) &params, (optim_200_output*) optimSoln, (optim_200_info*) &infoOut); break; } return exitFlag; }; }; 
+4
source share
3 answers

I do not know if your code is included in each case part in the example. If your functions are built-in functions, and you put all this into one function, then it will be much slower because the code is laid out in virtual memory, which will require many jumps for the CPU as the code executes. If this is not all nested, perhaps these guidelines may help.

Your decision can be improved by ...

A) 1) Divide the project into 200 separate DLLs. Then create a .bat file or similar. 2) Make an export function in each DLL called "MyEntryPoint", and then use dynamic linking to load the libraries as needed. Then it will be equivalent to a loaded music program with lots of plugins with few DLLs. Take a function pointer in EntryPoint with GetProcAddress.

Or...

B) Build each solution as a separate .lib file. Then it will be compiled very quickly for each solution, and you can link them all together. Create an array of function pointers for all functions and call them instead.

result = SolveInterceptWhichStep;

Combining all lib libraries into one large library should not take eight hours. If it takes so long, you are doing something very wrong.

and...

Try putting the code in different actual .cpp files. It is possible that a particular compiler will do a better job if they are all in different units, etc. Then, after each block has been compiled, it will remain compiled if you do not change anything.

+1
source

Make sure that you measure and average multiple synchronization calls to the optimizer, because this can be associated with high installation costs before the first call.

Then also check what this 200-branch conditional statement (your switch) does with your performance! Try to exclude this switch for testing, calling only one solver in the test project, but linking them all in the DLL. Do you still see slow performance?

0
source

I assume that the reason you generate the code is for improving performance at runtime, as well as for better correctness. I'm doing the same thing.

I suggest you try this technique to find out what the runtime performance problem is.

If you see a difference in performance of 100: 1, this means that every time you interrupt it and look at the state of the program, a probability of 99% you will see that the problem.

As for the build time, it definitely makes sense to modulate it. None of this should have a big impact on runtime unless it means you're doing crazy I / O.

0
source

Source: https://habr.com/ru/post/1432883/


All Articles