Here is a pretty simple code compiled with -O2 (gcc 4.8.5):
unsigned char * linebuf; int yuyv_tojpegycbcr(unsigned char * buf, int w) { int col; unsigned char * restrict pix = buf; unsigned char * restrict line = linebuf; for(col = 0; col < w - 1; col +=2) { line[col*3] = pix[0]; line[col*3 + 1] = pix[1]; line[col*3 + 2] = pix[3]; line[col*3 + 3] = pix[2]; line[col*3 + 4] = pix[1]; line[col*3 + 5] = pix[3]; pix += 4; } return 0; }
and here is the corresponding assembly:
0000000000000000 <yuyv_tojpegycbcr>: 0: 83 fe 01 cmp $0x1,%esi 3: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax
When compiling without a limiter, the output is identical: Lots of mixed loads and storage. Some value is loaded twice, and it looks like no optimization has happened. If pix
and line
parsed, I expect the compiler to be smart enough, and among other things, load pix [1] and pix [3] only once.
Do you know anything that can restrict
qualifier?
PS: With the new gcc (4.9.2), on a different architecture (hand v7), the result is similar. Here is a test script to compare the generated code with and without restriction.
#!/bin/sh gcc -c -o test.o -std=c99 -O2 yuyv_to_jpegycbcr.c objdump -d test.o > test.S gcc -c -o test2.o -O2 -D restrict='' yuyv_to_jpegycbcr.c objdump -d test2.o > test2.S
source share