How well do linkers cope with functions that return quickly?

In C, if I have a function call that looks like

// main.c ... do_work_on_object(object, arg1, arg2); ... // object.c void do_work_on_object(struct object_t *object, int arg1, int arg2) { if(object == NULL) { return; } // do lots of work } 

then the compiler will generate a lot of things in main.o to save the state, transfer parameters (hopefully in the register in this case) and restore the state.

However, during the connection, you will notice that arg1 and arg2 are not used in the return trip, so cleaning up and restoring state can be shorted out. Do linkers typically do these things automatically, or will connection time optimization (LTO) need to be enabled to get this working?

(Yes, I could test the disassembled code, but I'm interested in the behavior of compilers and linkers in general and on several architectures, so we hope to learn from the experience of others.)

Assuming profiling shows that this function call is worth optimizing, should you expect the following code to be noticeably faster (for example, without the need for LTO)?

 // main.c ... if(object != NULL) { do_work_on_object(object, arg1, arg2); } ... // object.c void do_work_on_object(struct object_t *object, int arg1, int arg2) { assert(object != NULL) // generates no code in release build // do lots of work } 
+6
source share
2 answers

Since compiler / linker support for this is not widespread, you can write your code in such a way as to get great benefits, at the cost of dividing the logic of your function into two places.

If you have a quick way that almost does not require any code, but is often enough to pass the value, put this part in the header so that it gets queued, returning to the rest of the function call (which you make private, therefore we can assume that all checks in the built-in part have already been completed).

eg. par2, which processes the data block, has a quick path when the galois16 coefficient is zero. ( dst[i] += 0 * src[i] is no-op, even when * is a multiplication in Galois16, and += is an addition of GF16 (that is, bitwise XOR)).

Note that the transaction in question renames the old function to InternalProcess and adds a new template<class g> inline bool ReedSolomon<g>::Process , which checks the path, and otherwise calls InternalProcess . (as well as creating a bunch of unrelated space changes and some ifdefs ... This was originally fixed in CVS for 2006.)

Commentary in fixation requires only an 8% increase in speed for recovery.

+2
source

Neither the installation status code nor the cleaning code can be shorted, because the resulting compiled code is static, and it does not know what will happen when the program is executed. Therefore, the compiler will always need to configure the entire stack of parameters.

Think of two situations: in one object there is nil , and in the other not. How will the assembly code know if the other arguments are put on the stack? Moreover, the caller is responsible for placing the arguments in the right place (stack or registry).

0
source

Source: https://habr.com/ru/post/985369/


All Articles