What shuts down GCC __restrict__ from

Here is a pretty simple code compiled with -O2 (gcc 4.8.5):

unsigned char * linebuf; int yuyv_tojpegycbcr(unsigned char * buf, int w) { int col; unsigned char * restrict pix = buf; unsigned char * restrict line = linebuf; for(col = 0; col < w - 1; col +=2) { line[col*3] = pix[0]; line[col*3 + 1] = pix[1]; line[col*3 + 2] = pix[3]; line[col*3 + 3] = pix[2]; line[col*3 + 4] = pix[1]; line[col*3 + 5] = pix[3]; pix += 4; } return 0; } 

and here is the corresponding assembly:

 0000000000000000 <yuyv_tojpegycbcr>: 0: 83 fe 01 cmp $0x1,%esi 3: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # a <yuyv_tojpegycbcr+0xa> a: 7e 4e jle 5a <yuyv_tojpegycbcr+0x5a> c: 83 ee 02 sub $0x2,%esi f: 31 d2 xor %edx,%edx 11: d1 ee shr %esi 13: 48 8d 74 76 03 lea 0x3(%rsi,%rsi,2),%rsi 18: 48 01 f6 add %rsi,%rsi 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 20: 0f b6 0f movzbl (%rdi),%ecx 23: 48 83 c2 06 add $0x6,%rdx 27: 48 83 c7 04 add $0x4,%rdi 2b: 48 83 c0 06 add $0x6,%rax 2f: 88 48 fa mov %cl,-0x6(%rax) 32: 0f b6 4f fd movzbl -0x3(%rdi),%ecx 36: 88 48 fb mov %cl,-0x5(%rax) 39: 0f b6 4f ff movzbl -0x1(%rdi),%ecx 3d: 88 48 fc mov %cl,-0x4(%rax) 40: 0f b6 4f fe movzbl -0x2(%rdi),%ecx 44: 88 48 fd mov %cl,-0x3(%rax) 47: 0f b6 4f fd movzbl -0x3(%rdi),%ecx 4b: 88 48 fe mov %cl,-0x2(%rax) 4e: 0f b6 4f ff movzbl -0x1(%rdi),%ecx 52: 88 48 ff mov %cl,-0x1(%rax) 55: 48 39 f2 cmp %rsi,%rdx 58: 75 c6 jne 20 <yuyv_tojpegycbcr+0x20> 5a: 31 c0 xor %eax,%eax 5c: c3 retq 

When compiling without a limiter, the output is identical: Lots of mixed loads and storage. Some value is loaded twice, and it looks like no optimization has happened. If pix and line parsed, I expect the compiler to be smart enough, and among other things, load pix [1] and pix [3] only once.

Do you know anything that can restrict qualifier?

PS: With the new gcc (4.9.2), on a different architecture (hand v7), the result is similar. Here is a test script to compare the generated code with and without restriction.

 #!/bin/sh gcc -c -o test.o -std=c99 -O2 yuyv_to_jpegycbcr.c objdump -d test.o > test.S gcc -c -o test2.o -O2 -D restrict='' yuyv_to_jpegycbcr.c objdump -d test2.o > test2.S 
+5
source share
1 answer

Place a restriction on function parameters, not local variables.

From my experience, most compilers (including GCC) use the restriction only if it is specified in the function parameters. All uses of local variables within a function are ignored.

I suspect this is due to anti-aliasing performed at the function level and not at the base unit level. But I have no evidence to support this. In addition, it probably depends on the version of the compiler and the compiler.

In any case, these things are pretty subtle to rely on. Therefore, if performance matters, either you optimize it manually, or remember to revise it every time you update or change compilers.

+4
source

Source: https://habr.com/ru/post/1244108/


All Articles