How to optimize this piece of code in C

Yesterday in an interview I was asked to test the 5th bit in the number (to check if it was turned on and off) and below, as I did.

int number = 16; int mask = 1<<5; if ((number & mask) == 0) printf("Bit is off"); else printf("its on"); 

but then he asked me to optimize this code and do it without using this particular mask.

So my questions are which mask could I use in this code?

+6
source share
11 answers

Perhaps the interviewer wanted to see how you reacted to a simple call. Or just wanted to know if you really understood C, and you stay in your place? Maybe the interviewer wanted to know if you knew that a non-zero value is true, and therefore test your depth of understanding of C? Or maybe you could make a binary in hex?

IMHO interview about the lot more than in the code. They are expensive. I always tried to get a clear idea of ​​the person, it is difficult to do something by writing a message or even by phone. In the end, some of these people will work with recruit!

The most compact and possibly the fastest:

 int number = 16; // is this really what they gave? printf((number & 0x20)?"its on":"Bit is off"); // did they mean 5th or bit 5? 

Edit:

I encoded the original approach and my alternative and compiled it for ARM Coretx-M3 / 4 (this is the processor I'm writing about now). It was compiled with -O3. Then I parsed each compiled file (using objdump) to get assembler. I did it like this because gcc -S output was huge; which includes a lot of additional information for the assembler and linker, which makes it difficult to find the code.

I replaced printf with dummy_printf to avoid #including stdio.h, which added more noise. Dummy_printf is not exactly the same as printf, it just takes one parameter, but it saves a short output :-)

Source (all 7 files added for easy reading): http://pastebin.com/PTeApu9n

The resulting concatenated objdump output (for each .o) is: http://pastebin.com/kHAmakE3

As you can see, the original is compiled for:

 void original_bit5(int number) { int mask = 1<<5; if ((number & mask) == 0) 0: f010 0f20 tst.w r0, #32 4: d005 beq.n 1a <original_bit5+0x1a> dummy_printf("Bit is off"); else dummy_printf("its on"); 6: f240 0000 movw r0, #0 a: f2c0 0000 movt r0, #0 e: f7ff bffe bw 0 <dummy_printf> void original_bit5(int number) { int mask = 1<<5; if ((number & mask) == 0) dummy_printf("Bit is off"); 12: f240 0000 movw r0, #0 16: f2c0 0000 movt r0, #0 1a: f7ff bffe bw 0 <dummy_printf> 1e: bf00 nop 

I think the dummy_printf call uses a chain of tail calls, i.e. dummy_printf will not return to this function. Very effective!

There is no function input code, since the first four function parameters are passed to the registers r0-r3.

You cannot see the addresses of two lines loaded into r0. This is because it is not connected.

You can see that:

 int mask = 1<<5; if ((number & mask) == 0) 

compiled for:

  0: f010 0f20 tst.w r0, #32 4: d005 beq.n 1a <original_bit5+0x1a> 

So 1<<5 and (... == 0) , are a compiler for a more direct and efficient sequence of instructions. There is a branch for the corresponding call to dummy_printf.

My code compiles to:

 void my_bit5(int number) { dummy_printf((number & 0x20)?"its on":"Bit is off"); 0: f240 0200 movw r2, #0 4: f240 0300 movw r3, #0 8: f010 0f20 tst.w r0, #32 c: f2c0 0200 movt r2, #0 10: f2c0 0300 movt r3, #0 14: bf14 ite ne 16: 4610 movne r0, r2 18: 4618 moveq r0, r3 1a: f7ff bffe bw 0 <dummy_printf> 1e: bf00 nop 

This is also similar to tail call optimization, i.e. returning from this function is impossible, because there is no need for one, returning dummy_printf will return directly to main ()

What you do not see are two registers, r2 and r2 will contain the addresses of two lines. This is because it is not connected.

As you can see, there is a conditional execution instruction 'ite', which loads the parameter register r0 with either register r2 or register r3. So there is no branch in this code.

For a simple pipelined processor, this can be quite efficient. On a simple pipeline processor, a branch can cause a “pipeline stop” while parts of the pipeline are being cleaned. It depends on the processor and processor. Therefore, I assume that gcc got it right and generated a better code sequence than branch execution. I did not check.

Raised by Lundin, I came up with this:

 void union_bit5(int number) { union { int n; struct { unsigned :5; unsigned bit :1; }; } tester; tester.n = number; if (tester.bit) dummy_printf("Bit is on"); else dummy_printf("its off"); } 

It clearly does not include a mask or bit offset. It almost certainly depends on the compiler, you will have to test it to make it work (glerk! - (

gcc for ARM generates the same code (bne vs beq, but which can be adjusted) as an OP solution, so there is no optimization, but it removes the mask:

 void union_bit5(int number) { union { int n; struct { unsigned :5; unsigned bit :1; }; } tester; tester.n = number; if (tester.bit) 0: f010 0f20 tst.w r0, #32 4: d105 bne.n 1a <union_bit5+0x1a> dummy_printf("Bit is on"); else dummy_printf("its off"); 6: f240 0000 movw r0, #0 a: f2c0 0000 movt r0, #0 e: f7ff bffe bw 0 <dummy_printf> void union_bit5(int number) { union { int n; struct { unsigned :5; unsigned bit :1; }; } tester; tester.n = number; if (tester.bit) dummy_printf("Bit is on"); 12: f240 0000 movw r0, #0 16: f2c0 0000 movt r0, #0 1a: f7ff bffe bw 0 <dummy_printf> 1e: bf00 nop 

What is it worth:

 (number & 0x20)? dummy_printf("its on") : dummy_printf("Bit is off"); 

gcc for ARM generates exactly the same code as OP. It generates branches, not conditional instructions.

Summary:

  • Source code compiled into a very efficient instruction sequence
  • The ternary operator ...?...:... can compile code that does not include branches in ARM Cortex-M3 / 4, but can also generate normal branch instructions.
  • In this case, it is difficult to write a more efficient code than the original :-)

I will add, IMHO, the cost of executing printf is so huge compared to the bit test that there is too little worry about optimizing the bit test; he fails Amdahl Law . An appropriate tactic for bit testing is to use -O3 or -Os.


If you want to do something a little perverted (especially for such a trivial problem), but another thing that might make your interlocutor think, create a lookup table for each byte value. (I do not expect it to be faster ...)

 #define BIT5(val) (((val)&0x20)?1:0) const unsigned char bit5[256] = { BIT5(0x00),BIT5(0x01), BIT5(0x02),BIT5(0x03), BIT5(0x04),BIT5(0x05), BIT5(0x06),BIT5(0x07), // ... you get the idea ... BIT5(0xF8),BIT5(0xF9), BIT5(0xFA),BIT5(0xFB), BIT5(0xFC),BIT5(0xFD), BIT5(0xFE),BIT5(0xFF) }; //... if (bit5[(unsigned char)number]) { printf("its on"); } else { printf("Bit is off"); } 

This is a convenient technique if, for example, there are some complex bit patterns, for example, a peripheral register that needs to be converted to a solution or to a switch. This is also O (1)

You could combine the two! -)

+6
source

There are two ways to check the bit:

 if (number & (1 << bit)) { ... } if ((number >> bit) & 1) { ... } 

I think you will be interested: http://graphics.stanford.edu/~seander/bithacks.html

+2
source

Another way is

1: shift the number 5 times to the right so that the 5th bit becomes the 0th on the right (e.g. LSB).
2: Now the logic is numbers with LSB, since 1 is odd and numbers with 0 are equal. Make sure using% 2

If you think mod operations are much more expensive than bit operations, I think it all depends on the compiler. Look at this topic

And faster than integer modulo operation? .

I'm not sure why the interviewer asked you to optimize, maybe he was expecting the module method to respond.

+1
source

Are you sure you should move it 5 bits ? How about this:

 int n = 16; printf ("%d\n", (n >> 4) % 2); 
+1
source

You can use the instructions for testing bit-bit , but it is possible that the compiler will take over what you do, and do it anyway.

In addition, there is really nothing to optimize, and, of course, the only way to see if any of the minor variations of your method can be faster is through a profile.

Here is the code that gcc 4.2.1 -O3 creates for if((number >> 5) & 1)) :

 0000000100000ee0 pushq %rbp 0000000100000ee1 movq %rsp,%rbp 0000000100000ee4 shrl $0x05,%edi 0000000100000ee7 notl %edi 0000000100000ee9 andl $0x01,%edi 0000000100000eec movl %edi,%eax 0000000100000eee leave 0000000100000eef ret 

and for if(number & (1 << 5)) :

 0000000100000ee0 pushq %rbp 0000000100000ee1 movq %rsp,%rbp 0000000100000ee4 shrl $0x05,%edi 0000000100000ee7 notl %edi 0000000100000ee9 andl $0x01,%edi 0000000100000eec movl %edi,%eax 0000000100000eee leave 0000000100000eef ret 

So, we see that at least gcc 4.2.1 produces identical code in these cases, but does not use the instruction for bit-bit.

0
source
 (number & 16)?printf("yes"):printf("no"); 
0
source

A c apprentice newbie try

 int number = 16; if(16 == number&(0x10)) puts("true"); else puts("false"); 
0
source

All are shifted to the right. I want to be original and shift to the left:

 #define INDEX 5 int number = 16; if (number<<(sizeof(number)*8-INDEX-1)<0) printf("Bit #%d is set in %d.\n", INDEX, number); else printf("Bit #%d is NOT set in %d.\n", INDEX, number); 

This code is ugly and completely implementation dependent (the C standard says the result is undefined). It works on x86, and it is somewhat more efficient, because the MSB is always copied to bit # 7 (the “sign”) of the flag register, which can be tested with a single jns instruction.

In other words, for INDEX 5 you have:

 [...] shl $0x1F, %eax test %eax, %eax jns 8053635 [...] 

The original OP solution is cleaner and what the production code should look like.

0
source

Any attempt to optimize this code falls under the premature optimization category. If you understand how the compiler translates C into machine code, you would not try to optimize this code. I assume that the interviewer did not have such knowledge.

If we analyze this code, this is what we get:

1<<5 translates to literal 32 at compile time. There is no performance difference between writing int mask = 1<<5; and int mask = 32; , but the latter is much more difficult to understand.

Further

  • if ((number & mask) == 0) fully equivalent
  • if ((number & 32) == 0) fully equivalent
  • if ((number & (1<<5)) == 0)

There are two cases:

  • Or the compiler must find a memory location to save the mask.
    • If the user declares a variable mask, the value will be stored there.
    • If the user has not declared a variable, the value will be stored in an invisible temporary variable.
    • RAM consumption in the two above cases is completely equivalent.
  • Or the compiler does not need to store the mask anywhere. It will optimize the entire mask variable or numeric literal and bake them along with the rest of the program instruction.

Which of these two will be selected depends on whether int number = 16; changed int number = 16; or not from the declaration point to the if statement, where masking occurs.

What is it. Any attempt to write code differently than in your original example is a premature optimization and obfuscation and will not lead to any performance difference.

0
source

Forgive the following answer:

I used to work at the start, when the company decided not to pursue the candidate, they came up with a false reason to stop the interview. Perhaps it was a poster experience.

the requesting k-th bit may mean that the least significant bit is a zero bit, so (number and 1 <5) will not do. But it's not a problem. He asked for optimization. Once upon a time, the reason you exit the interview has nothing to do with you. In this case, it is their loss; There will always be another interview opportunity.

0
source

In one of the interviews I gave the following answer, and he was pleased, but a small change in the question was “check if the nth bit is set.

 int N = 16; printf ("%d\n", (N >> (n-1)) % 2); 

So, when you make the answer general, Not quite sure which one (below) is faster for this example.

 1<<(n-1) & N (or) N>>(n-1) % 2 (or) N>>(n-1) & 1 
0
source

Source: https://habr.com/ru/post/911076/


All Articles