Multiply by 7 in an efficient way

Question

Multiply by 7 in an efficient way

I recently came across the following interview question:

How can you multiply a number by 7 in an efficient and optimized way?

I know that I can multiply by 8 (or shift left by three bits) and then subtract the original value:

num = (num << 3) - num;

but are there any other solutions.

+8

c

AMM Nov 03 '11 at 7:14

source share

5 answers

In a limited range, you can use a lookup table:

 static unsigned int mult7[] = {0, 7, 14, 21, ...}; unsigned int three = 3; unsigned int twenty_one = mult7[three];

This may seem silly (and probably this applies to this particular case), but it often helps for things where there is real value to calculate. I'm just not sure what multiplication by seven considers one of these cases.

To start, multiplying x by 7 (or shifting x three bits left after subtracting x ) is an operation that can be performed entirely inside the CPU. When viewing a table, you can see multiply by four (shift two bits to the left) and then add to get the correct address, but then you need to access the memory to perform the actual search - even with caching and other wonderful tricks that the current ones may have processors are likely to slow down.

There is also a good chance that your compiler will already know all the tricks on how to multiply quickly. If your sevens are constant (either const int or equivalent), the compiler probably already chose the fastest way, and there are good chances that compiler writers know a lot more about this than mere mortals :-) ^(a)

But for cases when the cost of calculations is relatively high, calculate the values once and implement them in your code, since the search table is one of the standard optimization strategies (response time for space).

^(a) Examine the following code:

 #include <stdio.h> static int mult7 (int num) { return num * 7; } int main (int argc, char *argv[]) { printf ("%d\n", mult7 (atoi (argv[1]))); return 0; }

During normal compilation on gcc , mult7 appears as a left shift and subtraction of the trick:

 _mult7: pushl %ebp ; stack frame setup. movl %esp, %ebp movl 8(%ebp), %edx ; get value to edx movl %edx, %eax ; and eax. sall $3, %eax ; eax <- eax * 8. subl %edx, %eax ; eax <- eax - edx. popl %ebp ; stack frame teardown and return. ret

In -O3 (what I like to call a crazy optimization level), all of this is built into main with:

 call _atoi movl $LC0, (%esp) leal 0(,%eax,8), %edx ; these two are the relevant instructions. subl %eax, %edx movl %edx, 4(%esp) call _printf

Please note that this nesting action is possible only because of the static nature of the function - if it were visible to the linker, it would have to be saved as a separate function if another object file was needed to call it.

If you uncheck static , it really saves it without binding with all the settings and deleting the stack frame, but at least it still uses the (supposedly) more efficient trick mentioned below. You can get rid of the stack frame code in gcc if you use -fomit-frame-pointer , if this does not adversely affect the code, but it starts to dig into the dark side a bit :-)

This trick is to use the LEA command to set edx to eax * 8 , and then subtract eax from it. The same theory as sall/subl under normal optimization is slightly different from mechanics.

Down below, trust your compiler. If you want to multiply num by 7, use the following:

 num *= 7;

Most likely, no matter what improvement you choose from such an attempt at micro-optimization, you could get a much better improvement by looking at the macro level (choice of algorithm and data structure, etc.).

+16

paxdiablo Nov 03 '11 at 7:22

source share

The way I do this will be something like

 num = (num << 3) - num;

i.e. 2 ^ 3 = 8, then subtract the number times 7.

I just compiled the following code with gcc:

 int mul(int num) { return num * 7; }

and this is a gdb dump of what it compiled:

 Dump of assembler code for function mul: 0x00000000004004c4 <+0>: push rbp 0x00000000004004c5 <+1>: mov rbp,rsp 0x00000000004004c8 <+4>: mov DWORD PTR [rbp-0x4],0xa 0x00000000004004cf <+11>: mov edx,DWORD PTR [rbp-0x4] 0x00000000004004d2 <+14>: mov eax,edx 0x00000000004004d4 <+16>: shl eax,0x3 0x00000000004004d7 <+19>: sub eax,edx 0x00000000004004d9 <+21>: mov DWORD PTR [rbp-0x4],eax 0x00000000004004dc <+24>: pop rbp 0x00000000004004dd <+25>: ret End of assembler dump.

So, it seems that my machine is shifting left 3 times, and then subtracting the multiplied number is what gcc considers optimal.

EDIT:. Aligns with an optimization level of at least 1 ( -O1 ), gcc uses the lea trick:

 Dump of assembler code for function mul: 0x00000000004004e0 <+0>: lea eax,[rdi*8+0x0] 0x00000000004004e7 <+7>: sub eax,edi 0x00000000004004e9 <+9>: ret End of assembler dump.

+6

AusCBloke Nov 03 '11 at 7:26

source share

In fact, the most efficient way to multiply by 7 might be to use the multiplication operator. It depends on the relative speed of the corresponding instructions on the target platform.

IMO, the full answer to such an interview question should also mention the following:

Such optimizations are usually best suited for the compiler / compiler. (Indeed, from another answer, it seems that gcc is optimizing this case.)
You (as a programmer) should only spend time on this if 1) there is a real (measurable) performance problem, and 2) your profiler tells you that the statements you are looking for are performance critical.

In his answer. Olaf wrote this:

"I disagree with Stephen C when he tells you what you should (or should not) do. If everyone did this, there would be no innovation in the software industry."

It would seem that Olaf does not believe any of the following:

that a software engineer should give advice,
for a software engineer to consult, or
so that the employee / programmer avoids wasting boss time on pointless hand optimization.

It is true that if everyone always acted on the advice he received, there would be less innovation. But the flip side is that working in the hand usually does not require much innovation. (And this rarely requires hand optimization ...)

Also, if ignoring advice (best practice) was a virtue, then 75% of software engineers would spend their time storing "goto spaghetti", assembler code, or the results of some 1990 quirk in design methodology.

So, you should at least understand the advice and consider the potential consequences of ignoring it. Just as the boss takes a dull look at your "innovative" (or rather, waste of time) on his projects.

+4

Stephen C Nov 03 '11 at 7:21

source share

As Stephen S says, "the most efficient way to multiply by 7 can be the multiplication operator."

In this article - Delays and throughput of AMD and Intel x86 processors - Thorbjørn Granlund from the Royal Institute of Technology in Stockholm shows that unsigned multiply requires 3/5 clock cycles in 32/64-bit modes in K10 and 4/4 architecture Sandy Bridge. If you need to perform multiple multiplication, then K10 can produce multiply each other clock in 32/64-bit modes. This means that it can work with three multiplications at different stages simultaneously (3/1) and 2.5 (5/2) in 64-bit. Sandy Bridge produces one each other / every measure in 32/64. This means two (4/2) or four (4/1) teams at the same time.

Personally, I think that it will be difficult for you to achieve more thanks to multi-shift sequence. I disagree with Stephen C when he tells you what you should (or should not) do. If everyone did this, there would be no innovation in the software industry.

So: go for it!

+1

Olof Forshell Nov 03 '11 at 9:45 a.m.

source share

Keith Thompson · Accepted Answer · 2011-11-03 07:23

To get a few out of 7 in an efficient way:

7 is a multiple of 7. This answers the question you asked, but I'm sure that it does not answer the question you want to ask.

EDITOR: The above is based on the original title of the question I just corrected.

To multiply by 7 effectively, simply write, for example:

 x * 7

and call your compiler with optimization. Let the compiler figure out if one MUL command is effective or something like (x<<3) - x for the current computer.

There is another implicit question: what was the interviewer looking for? I hope that “let the compiler worry about it” will be an acceptable answer. (x<<3) - x is probably the most obvious micro-optimization, but can lead to incorrect answers if x<<3 overflows, and depending on the system it may be slower than the MUL instruction.

(If I were an interviewer, I would be more impressed with a good explanation and understanding of the problems than any specific answer.)

EDIT

Think about the kinds of micro optimizations discussed here that can be useful if you know more about the possible x values than the compiler does. If you know, due to the nature of your program logic, that x will always be in the range 0..10, then the lookup table can be easily faster than the multiplication operation. Or, if you know that x is in this range 99% of the time, a lookup table with a deviation from the actual multiplication can only be that.

But if the analysis of the flow compiler of your program does not allow him to prove that x always in this range, then he cannot perform such optimization.

But such circumstances are very rare. And when your code runs in a new environment, where x may be 11 (maybe it works on a device with a large display), kaboom. And the performance improvement was most likely minor in the first place.

There are times when micro-optimization is appropriate, but there is considerable cost in development and testing. Do this only if actual measurements show that it is worth it.

Multiply by 7 in an efficient way

More articles: