What is the best approach to multiply two bytes using only bit offsets and additions?

Starting question:

A group of us (students of electronic engineering - Great Britain) have been doing very well with programming the PIC16F84A microcontroller recently. There was a need to multiply two 8-bit numbers together, without any known min / max for each. A fellow student presented the following idea.

multiply_numbers: ; Takes numbers in Num1 and Num2, and returns product in OutH:OutL clrf OutH ; clear all non-input variables clrf OutL mult_loop bcf STATUS,c ; clear carry bit movfw Num2 addwf OutL ; add Num2 to OutL btfsc STATUS,c ; check carry bit incf OutH ; if set, increment OutH decfsz Num1 ; decrement Num1 goto mult_loop ; if Num1 is not zero, repeat loop return ; else return 

I felt that this, although rather short in terms of lines of code, could take a relatively long time for large numbers. I thought about myself a bit and started with the route of shifting one number to the right and the other to the left and adding the number shifted from the left several times, along the way, to the exit to get to the final answer. I was not quite right, but then came across this question on SO, which gave me the idea to express one of the input numbers as follows:

N = a_0 + a_1 * 2 + a_2 * 2 ^ 2 + a_3 * 2 ^ 3 + ... + a_7 * 2 ^ 7

From this starting point, I came up with this method for multiplying two 8-bit numbers to get 16-bit output (stored in two 8-bit registers).

 multiply_numbers: ; Takes numbers in Num1 and Num2L, and returns product in OutH:OutL clrf Num2H ; clear all non-input variables clrf OutL clrf OutH mult_loop btfsc Num1,0 ; test LSB of Num1 call add_num16 ; if set, add Num2H:Num2L to OutH:OutL call shift_left ; shift Num2H:Num2L left (multiply by 2) rrf Num1,f ; shift Num1 right clrw ; clear working register (0x00) bcf STATUS,z ; clear zero bit (3) of the STATUS register addwf Num1,w ; add 0x00 to Num1 btfss STATUS,z ; if Num1 is zero, then exit loop goto mult_loop ; else, continue with another iteration return add_num16 movfw Num2H addwf OutH,f ; add Num2H to OutH bcf STATUS,c ; clear carry bit (0) of the STATUS register movfw Num2L addwf OutL,f ; add Num2L to OutL btfsc STATUS,c ; check carry bit incf OutH,f ; increment OutH if set (OutL overflowed) return shift_left bcf STATUS,c ; clear carry bit rlf Num2L,f ; rotate Num2L left (carry -> LSB, MSB -> carry) rlf Num2H,f ; rotate Num2H left, using carry bit from Num2L return 

I think this second example is faster in most cases, simply because the loop will only repeat up to 8 times, and not up to 256 times.

Am I correct in my assumption regarding their relative speed / efficiency? And does the second block of code really work the way I intend it (are there any potential problems with it that I missed)? And finally, can this multiplication be further optimized using techniques that are not yet in use?

Thanks in advance.

PS All variables / registers were correctly defined with their own address. The extensive commenting on the code is that we are trying to compile a set of routines that we can reference in the future, and we still know at a glance what is happening and why.

PPS This question is related to a personal / hobby interest in programming this drawing and has nothing to do with any current course, etc. Just to eliminate any suspicions you might have!


Microcontroller: PIC16F84A
Development Environment: MPLABX IDE v1.10
Compiler: mpasm (v5.43)


Edit # 1:

  • Instead of testing LSB Num1 and adding left-shifted Num2H: Num2L to OutH: OutL, check out MSB Num1 and add Num2 to left-shifted OutH: OutL.
  • Make the "shift_left" inline, not the subfunction called.
  • Expand 'mult_loop' to optimize the eighth iteration.

Improved method 2:

 multiply_numbers: ; Takes numbers in Num1 and Num2, and returns product in OutH:OutL clrf OutL ; clear all non-input variables clrf OutH ; 1st iteration btfsc Num1,7 ; test MSB of Num1 call add_num8 ; if set, add Num2 to OutH:OutL bcf STATUS,c ; clear carry bit rlf OutL,f ; rotate OutL left (carry -> LSB, MSB -> carry) rlf OutH,f ; rotate OutH left, using carry bit from OutL rlf Num1,f ; shift Num1 left ; 2nd iteration btfsc Num1,7 call add_num8 bcf STATUS,c rlf OutL,f rlf OutH,f rlf Num1,f ; 3rd iteration btfsc Num1,7 call add_num8 bcf STATUS,c rlf OutL,f rlf OutH,f rlf Num1,f ; 4th iteration btfsc Num1,7 call add_num8 bcf STATUS,c rlf OutL,f rlf OutH,f rlf Num1,f ; 5th iteration btfsc Num1,7 call add_num8 bcf STATUS,c rlf OutL,f rlf OutH,f rlf Num1,f ; 6th iteration btfsc Num1,7 call add_num8 bcf STATUS,c rlf OutL,f rlf OutH,f rlf Num1,f ; 7th iteration btfsc Num1,7 call add_num8 bcf STATUS,c rlf OutL,f rlf OutH,f rlf Num1,f ; 8th iteration btfss Num1,7 ; test MSB of Num1 return ; if not set, then return. else... add_num8 bcf STATUS,c ; clear carry bit (0) of the STATUS register movfw Num2 addwf OutL,f ; add Num2L to OutL btfsc STATUS,c ; check carry bit incf OutH,f ; increment OutH if set (OutL overflowed) return 
+6
source share
2 answers

Yes, but you are likely to achieve more. To do this, there are a bunch of classic "tricks".

First, knowing that the multiplier can be interpreted as the sum of the powers of two, you will smartly add animation to the multiplier when the multipicand bit is nonzero.

Secondly, value added is only the size of the multiplicable. Although you need 16 (partial and final product), you do not need to make 16-bit additions; you can make 8 bits of additives and propagate any transfer. This is usually easy to do in assembler.

To save time, you do not want to call the ADD procedure in the middle of a loop either. Embed the code to save time spent on calling, returning and optimizing registry shuffling. Finally, you execute the cycle only and exactly 8 times; it is advisable to deploy such a loop 8 times in order to avoid overhead on the meter and reduce the "pressure in the register" caused by the need to tinker with the meter, which gives you more freedom for optimization.

Note that I did not say anything about the PIC controller, and in fact I do not know its instruction set. But what I said is relevant to anyone who implements 8-bit multiplication. (There are equivalent tricks for multiplying by 16, 32, and 64 bits). So on can abstractly write the following code:

  mul16: // computes M1 * M2 --> P where M1 and M2 are 8 bit values, P is 16 bits // P is represent by Plow and Phigh 8 bit values. // reset the (partial) product Plow=0; Phigh=0; // all 16 bits // First iteration: if msb(M1)==1 then { Plow+=M2; if carry then Phigh++; /* propagate carry */ } shift M1 left bit; shift (Phigh,Plow) left one bit // Second iteration <same as first> <3rd ..7th iteration, same as first> // 8th iteration if msb(M1)==1 then { Plow+=M2; if carry then Phigh++ } // dont bother: shift M1 left bit; // dont bother: shift (Phigh,Plow) left one bit <done> 

You can skilfully notice that what is written as โ€œif msb (M1) ...โ€ and โ€œshift M1 left one bitโ€ is often easily implemented with the โ€œshift leftโ€ assembler instructions, or in desperation, adding value to yourself: - } Similarly, "if you transfer ... add one" is often implemented with the instruction "add transfer".

I leave it for you to transcode it for the pic.

+5
source

Oh, my. I have not written multiplication code in assembler for about 30 years. This dates back to the time of writing code for Apple II assembler in 6502.

You are absolutely right that the second approach is much faster. 8 additions and 8 shifts significantly, much faster than up to 256 additions.

However, I think you have it back.

You want to start with MSB num1, and if this bit is 1, add num2 to your result. After each bit, but LSB from num1, shift the result remaining by 1.

+4
source

Source: https://habr.com/ru/post/916218/


All Articles