How are ntoh functions implemented in RHEL / GCC?

The production problem led our team to the following questions:

  • In RHEL6 using GCC 4.4.6 implemented ntohs and ntohl ?
  • Are fast or slow implementations known?
  • How can I see the generated assembly code for functions?

I know that the consequences behind the questions may seem far-fetched and ridiculous, but they asked me to investigate.

The hardware is an Intel processor, a small continental 64-bit processor and compiled into 64 bits.

+6
source share
4 answers
  • They are provided by glibc, not GCC, see /usr/include/bits/byteswap.h for the __bswap_16 and __bswap_32 that are used when optimization is turned on (see <netinet/in.h> ).
  • You didnโ€™t say what architecture you are using, they do not work with enthusiasm in the system, so itโ€™s optimally fast! In little-endian, they are an architecture-oriented assembler algorithm.
  • Use the GCC -save-temps to store intermediate .s files or use -S to stop after compilation and before building the code or use http://gcc.godbolt.org/
+11
source

Follow these steps:

test.c

 #include <arpa/inet.h> int main() { volatile uint32_t x = 0x12345678; x = ntohl(x); return 0; } 

Then compile with:

 $ gcc -O3 -g -save-temps test.c 

And test.s resulting test.s file test.s or alternatively run objdump -S test.o

On my machine (Ubuntu 13.4) the corresponding asssembler:

 movl $305419896, 12(%esp) movl 12(%esp), %eax bswap %eax movl %eax, 12(%esp) 

Tips:

  • 305419896 - 0x12345678 in decimal form.
  • 12(%esp) is the address of the mutable variable.
  • All movl instructions exist for volatile -ness << 28>. The only really interesting instruction is bswap .
  • Obviously, ntohl compiled as inline-inline.

Also, if I look at test.i (precompiled output), I find that ntohl is #defined as just __bswap_32() , which is a built-in function, only with a call to __builtin_bswap32() .

+12
source

They are implemented in glibc. See /usr/include/netinet/in.h. They will most likely rely on glibc byteswap macros (/usr/include/bits/byteswap.h on my machine)

They are implemented in the assembly in my header, so this should be pretty fast. For constants, this is done at compile time.

+7
source

GCC / glibc calls the ntohl () and htonl () bindings on the calling code. Thus, the function call overhead is excluded. In addition, each call to ntohl () or htonl () is converted to a single bswap assembler operation. According to the Intelยฎ 64 and IA-32 Architecture Optimization Reference Guide, bswap has both latency and 1 throughput for all modern Intel processors. Thus, to execute ntohl () or htonl (), only one CPU clock is required.

ntohs () and htons () are implemented as 8-bit rotation. This effectively replaces the two halves of the 16-bit operand. Delay and throughput are similar to delays in bswap.

+1
source

Source: https://habr.com/ru/post/950644/


All Articles