X86_64: Can I use the "PLT / GOT" links in the "replace" line?

I'm not sure what a good topic is for this question, but here we go ...

To force the locality / compactness of the code for a critical section of code, Iโ€™m looking for a way to call a function in an external (dynamically loaded) library through a โ€œtransition slotโ€ (ELF R_X86_64_JUMP_SLOT move) directly on the call site - this is what the linker usually places in PLT / GOT, but has these built-in rights on the call site.

If I imitate a call like:

 #include <stdio.h> int main(int argc, char **argv) { asm ("push $1f\n\t" "jmp *0f\n\t" "0: .quad %P0\n" "1:\n\t" : : "i"(printf), "D"("Hello, World!\n")); return 0; } 
in order to get space for a 64-bit word, the call itself works (please, no comments were lucky, as it violates some ABI rules - all this is not the subject of this question ... and, for my case, it can work around / drawn in other ways, I try to keep this example concise).

It creates the following assembly:

  0000000000000000 <main>:
    0: bf 00 00 00 00 mov $ 0x0,% edi
                         1: R_X86_64_32 .rodata.str1.1
    5: 68 00 00 00 00 pushq $ 0x0
                         6: R_X86_64_32 .text + 0x19
    a: ff 24 25 00 00 00 00 jmpq * 0x0
                         d: R_X86_64_32S .text + 0x11
         ...
                         11: R_X86_64_64 printf
   19: 31 c0 xor% eax,% eax
   1b: c3 retq 
But (due to using printf as the closest, I think ...?), The destination address here still refers to the PLT hook - the same R_X86_64_64 reloc. Linking the object file with libc to the actual executable leads to:
  0000000000400428 < printf@plt >:
   400428: ff 25 92 04 10 00 jmpq * 1049746 (% rip) # 5008c0 <_GLOBAL_OFFSET_TABLE_ + 0x20>
 [...]
 0000000000400500 <main>:
   400500: bf 0c 06 40 00 mov $ 0x40060c,% edi
   400505: 68 19 05 40 00 pushq $ 0x400519
   40050a: ff 24 25 11 05 40 00 jmpq * 0x400511
   400511: [.quad 400428]
   400519: 31 c0 xorl% eax,% eax
   40051b: c3 retq
 [...]
 DYNAMIC RELOCATION RECORDS
 OFFSET TYPE VALUE
 [...]
 00000000005008c0 R_X86_64_JUMP_SLOT printf 
Ie this still gives a two-step redirect, first passes the execution to the PLT hook, and then jumps to the library entry point.

Is there a way in which I can instruct the compiler / assembler / linker - in this example - to "embed" the target point of the jump slot at 0x400511 ? That is, replace the "local" (allowed at the time of the program with ld ) R_X86_64_64 reloc with the "remote" (allowed when loading the program at ld.so ) R_X86_64_JUMP_SLOT one (and forcibly not lazy-load this section of code)? Perhaps the mapfiles linker can make this possible - if so, how?

Edit:
To make this clear, the question arises of how to achieve this in a dynamically linked executable / for an external function that is available only in a dynamic library. Yes, this true static binding makes this easier, but:

  • there are systems (e.g. Solaris) where static libraries are usually not sent by the provider
  • there are libraries that are not available both source code and static versions

Therefore, static binding does not help here: (

Edit2:
I found that in some architectures (SPARC, mind you see the section on GNU SPARC permutations as a guide ), GNU can create specific types of navigation links for the linker in place using modifiers. The quoted SPARC uses %gdop(symbolname) so that the assembler %gdop(symbolname) instructions to the linker with the instruction "create this move right here." Intel assembler on Itanium knows the @fptr(symbol) link movement operator for the same type of thing (see also section 4 in Itanium psABI ). But is there an equivalent mechanism - something to tell the assembler to emit a specific type of linker movement at a specific position in the code - exist for x86_64?

I also found that the GNU .reloc has a .reloc directive, which is supposed to be used for this purpose; still if i try:

 #include <stdio.h> int main(int argc, char **argv) { asm ("push %%rax\n\t" "lea 1f(%%rip), %%rax\n\t" "xchg %%rax, (%rsp)\n\t" "jmp *0f\n\t" ".reloc 0f, R_X86_64_JUMP_SLOT, printf\n\t" "0: .quad 0\n" "1:\n\t" : : "D"("Hello, World!\n")); return 0; } 

I get an error from the linker (note that 7 == R_X86_64_JUMP_SLOT ):

  error: /tmp/cc6BUEZh.o: unexpected reloc 7 in object file 
The assembler creates an object file for which readelf says:
  Relocation section '.rela.text.startup' at offset 0x5e8 contains 2 entries:
     Offset Info Type Symbol Value Symbol Name + Addend
 0000000000000001 000000050000000a R_X86_64_32 0000000000000000 .rodata.str1.1 + 0
 0000000000000017 0000000b00000007 R_X86_64_JUMP_SLOT 0000000000000000 printf + 0
This is what I want - but the linker does not accept it.
The compiler really only accepts the use of R_X86_64_64 instead of the above; this creates the same binary code as in the first case ... redirecting to printf@plt not to "allowed" ...
+6
source share
2 answers

To embed a call, you need to move the code ( .text ), the result of which is the final address of the function in the dynamically loaded shared library. Such a move does not exist (and modern static linkers do not allow them) on x86_64 using the GNU toolchain for GNU / Linux, so you cannot embed the whole call as you want.

Closest you can get a direct call via GOT (excludes PLT):

  .section .rodata .LC0: .string "Hello, World!\n" .text .globl main .type main, @function main: pushq %rbp movq %rsp, %rbp movl $.LC0, %eax movq %rax, %rdi call * printf@GOTPCREL (%rip) nop popq %rbp ret .size main, .-main 

This should generate a R_X86_64_GLOB_DAT transfer against printf in the GOT, which will be used in the above sequence. You need to avoid the C code, because in the general case, the compiler can use any number of registers with the caller stored in the prolog and epilogue, and this forces you to save and restore all such registers around the asm function call or jeopardize these registers for later use in the function wrappers. Therefore, it is easier to write a wrapper in a clean assembly.

Another option is to compile with -Wl,-z,now -Wl,-z,relro , which ensures that GOT entries associated with PLT and PLT are allowed at startup to increase the locality and compactness of the code. With full RELRO, you only need to run the code in PLT and access the data in the GOT, two things that should already be somewhere in the hierarchy of the logical core cache. If a full RELRO is enough to meet your needs, you donโ€™t need wrappers and you have added security benefits.

The best options are static links or LTO, if available to you.

+2
source

You can statically link the executable. Just add -static to the final link command, and all indirect jumps will be replaced with direct calls.

-1
source

Source: https://habr.com/ru/post/917173/


All Articles