What standard methods exist for using cpu accessibility features in a DLL?

Short version: I wonder if it is possible, and what is the best way to use a specific processor instruction in a DLL?

Slightly longer version: When loading (32-bit) DLLs, say, from Microsoft, it seems that one size fits all processors.

Does this mean that they are built for the lowest common denominator (i.e. the minimum platform supported by the OS)? Or is there some kind of method that is used to export a single interface to a DLL, but uses a specific code CPU behind the scenes to get optimal performance? And if so, how is this done?

+4
source share
5 answers

I don’t know any standard technique, but if I had to do this, I would write the code in the DllMain () function to determine the type of processor and fill out a transition table with function pointers for processor-optimized versions of each function.

There should also be the lowest common denominator function when the type of CPU is unknown.

Here you can find the current CPU information in the registry:

HKEY_LOCAL_MACHINE\HARDWARE\DESCRIPTION\System\CentralProcessor 
+6
source

It is expected that the DLL will work on every computer that runs WIN32, so you stick to the i386 command set as a whole. There is no official method for disclosing functionality / code for certain sets of instructions. You must do it manually and transparently.

The technique used is mainly as follows: - identify processor functions such as MMX, SSE at run time - if they are present, use them, if not, get a backup code

Since you cannot allow your compiler to optimize anything other than i386, you will have to write code using certain sets of instructions in the built-in assembler. I do not know if there are tools for this at a higher level. The definition of CPU functions is straightforward, but assembly may also need to be done.

+2
source

An easy way to get SSE / SSE2 optimization is to simply use the /arch argument for MSVC. I would not worry about the refusal - there is no reason to support anything below if you do not have a very niche application.

http://msdn.microsoft.com/en-us/library/7t5yh4fd.aspx

I believe gcc / g ++ have equivalent flags.

+1
source

The DLL files downloaded from Microsoft are intended for the general x86 architecture for the simple reason that it must work across a multitude of machines.

Until the time of the creation of Visual Studio 6.0 (I don’t know if it has changed), Microsoft used to optimize its DLLs for size, not speed. This is due to the fact that reducing the overall size of the DLL gave better performance than any other optimization that the compiler could generate. This is due to the fact that the acceleration from micro-optimization will be clearly low compared to the acceleration from the lack of processor expectations for memory. True speed improvements are associated with reduced I / O or an improvement in the underlying algorithm.

Only a few critical cycles that run at the core of a program can benefit from microoptimization simply because of the sheer number of exits. Only about 5-10% of your code can fall into this category. You can be sure that such critical loops will already be optimized in assembler by Microsoft software engineers to a certain level and will not leave the compiler behind. (I know this is too much, but I hope they do it)

As you can see, there would only be flaws in the larger DLL code, which includes additional versions of the code that are configured for different architectures, when most of this code is rarely used / is never part of the critical code that consumes most of your loops the processor.

+1
source

Intel ICC can compile code twice, for different architectures. So you can get your cake and eat it. (OK, you get two cakes - your DLL will be bigger). And even MSVC2005 can do this for very specific cases (e.g. memcpy () can use SSE4)

There are many ways to switch between different versions. The DLL loads because the loading process requires functions from it. Function names are converted to addresses. One solution is that this search depends not only on the function name, but also on the processor functions. Another method uses the fact that the name for the address function uses the pointer table in the intermediate step; you can disable the whole table. Or you can even have a branch inside critical functions; so foo () calls foo__sse4 when it happens faster.

+1
source

Source: https://habr.com/ru/post/1277143/


All Articles