Path: Home => AVR-Overview => Binary calculations => Hardware Multiplication
AT90S8515

Binary hardware multiplication in AVR Assembler


All ATmega, AT90CAN and AT90PWM have an on-board hardware multiplicator, that performs 8 by 8 bit multiplications in only two clock cycles. So whenever you have to do multiplications and you are sure that this software never ever needs not to run on an AT90S- or ATtiny-chip, you can make use of this hardware feature. This page shows how to do it.
The sections are:
  1. 8-by-8-binaries
  2. 16-by-8-binaries
  3. 16-by-16-binaries
  4. 16-by-24-binaries

1. Hardware multiplication of 8-by-8-bit binaries

The use is simple and straight-forward: if the two binaries to be multiplied are in the registers R16 and R17, just type

mul R16,R17

As the result of these two 8-bit binaries might be up two 16 bits long, the result will be in the registers R1 (most significant byte) and R0 (least significant byte). That's all about it.

Program 8-by-8 The program demonstrates the simulation in the Studio. It multiplies decimal 250 (hex FA) by decimal 100 (hex 64), in the registers R16 and R17.

Registers 8-by-8
8 by 8 result The registers R0 (LSB) and R1 (MSB) hold the result hex 61A8 or decimal 25,000.
8 by 8 cycles And: yes, that requires only two cycles, or 2 microseconds with a 1 Mcs/s clock.

To the top of that page

2. Hardware multiplication of a 16- by an 8-bit-binary

You have a larger binary to multiply? Hardware is limited to 8, so we need to invest some genious ideas instead. To solve the problem with larger binaries, we just look at this combination of 16 and 8 first. Understanding this concept helps understanding the method, so you will be able to solve the 32-by-64-bit multiplication problem later.

Registers 16-by-8 First the math: a 16-bit-binary are simply two 8-bit-binaries, where the most significant one of these two is multiplied by decimal 256 or hex 100. For those who need a reminder: the decimal 1234 is simply (12 multiplied by 100) plus 34, or (1 multiplied by 1000) plus (2 multiplied by 100) plus (3 multiplied by 10) plus 4. So the 16-bit-binary m1 is equal to 256*m1M plus m1L, where m1M is the MSB and m1L is the LSB. Multiplying m1 by 8-bit-binary m2 so is, mathmatically formulated:

m1 * m2 = (256*m1M + m1L) * m2, or 256*m1M*m2 + m1L*m2

So we just need to do two multiplications and to add both results. Sorry, if you see three asterisks in the formula: the multiplication with 256 in the binary world doesn't require any hardware at all, because it is a simple move to the next higher byte. Just like the multiplication by 10 in the decimal world is simply moving the number one left and write a zero to the least significant digit.

So let's go to a practical example. First we need some registers to
  1. load the numbers m1 and m2,
  2. provide space for the result, which might have 24 bits length.

;
; Test hardware multiplication 16-by-8-bit
;
; Register definitions:
;
.def Res1 = R2
.def Res2 = R3
.def Res3 = R4
.def m1L = R16
.def m1M = R17
.def m2 = R18

First we load the numbers:

;
; Load Registers
;
.equ m1 = 10000
;
	ldi m1M,HIGH(m1) ; upper 8 bits of m1 to m1M
	ldi m1L,LOW(m1) ; lower 8 bits of m1 to m1L
	ldi m2,250 ; 8-bit constant to m2

16 by 8 load The two numbers are loaded into R17:R16 (dec 10000 = hex 2710) and R18 (dec 250 = hex FA).

Then we multiply the LSB first:

;
; Multiply
;
	mul m1L,m2 ; Multiply LSB
	mov Res1,R0 ; copy result to result register
	mov Res2,R1

16 by 8 LSB The LSB multiplication of hex 27 by hex FA yields hex 0F0A, written to the registers R00 (LSB, hex A0) and R01 (MSB, hex 0F). The result is copied to the lower two bytes of the result register, R3:R2.

Now the multiplication of the MSB of m1 with m2 follows:

	mul m1M,m2 ; Multiply MSB

16 by 8 MSB The multiplication of the MSB of m1, hex 10, with m2, hex FA, yields hex 2616 in R1:R0.

Now two steps are performed at once: multiplication by 256 and adding the result to the previous result. This is done by adding R1:R0 to Res3:Res2 instead of Res2:Res1. R1 can just be copied to Res3. R0 is added to Res2 then. If the carry is set after adding, the next higher byte Res3 is increased by one.

	mov Res3,R1 ; copy MSB result to result byte 3
	add Res2,R0 ; add LSB result to result byte 2
	brcc NoInc ; if not carry, jump
	inc Res3
NoInc:

16 by 8 result The result in R4:R3:R2 is hex 2625A0, which is decimal 2500000 (as everybody knows), and is obviously correct.

16 by 8 cycles The cycle counter of the multiplication points to 10, at 1 Mcs/s clock a total of 10 microseconds. Very much faster than software multiplication!


To the top of that page

3. Hardware multiplication of a 16- by a 16-bit-binary

Now that we have understood the principle, it should be easy to do 16-by-16. The result requires four bytes now (Res4:Res3:Res2:Res1, located in R5:R4:R3:R2). The formula is:

Registers 16-by-16 m1 * m2 = (256*m1M + m1L) * (256*m2M + m2L) = 65536*m1M*m2M + 256*m1M*m2L + 256*m1L*m2M + m1L*m2L

Obviously four multiplications now. We start with the first and the last as the two easiest ones: their results are simply copied to the correct result register positions. The results of the two multiplications in the middle of the formula have to be added to the middle of our result registers, with possible carry overflows to the most significant byte of the result. To do that, you will see a simple trick that is easy to understand. The software:

;
; Test Hardware Multiplication 16 by 16
;
; Define Registers
;
.def Res1 = R2
.def Res2 = R3
.def Res3 = R4
.def Res4 = R5
.def m1L = R16
.def m1M = R17
.def m2L = R18
.def m2M = R19
.def tmp = R20
;
; Load input values
;
.equ m1 = 10000
.equ m2 = 25000
;
	ldi m1M,HIGH(m1)
	ldi m1L,LOW(m1)
	ldi m2M,HIGH(m2)
	ldi m2L,LOW(m2)
;
; Multiply
;
	clr R20 ; clear for carry operations
	mul m1M,m2M ; Multiply MSBs
	mov Res3,R0 ; copy to MSW Result
	mov Res4,R1
	mul m1L,m2L ; Multiply LSBs
	mov Res1,R0 ; copy to LSW Result
	mov Res2,R1
	mul m1M,m2L ; Multiply 1M with 2L
	add Res2,R0 ; Add to Result
	adc Res3,R1
	adc Res4,tmp ; add carry
	mul m1L,m2M ; Multiply 1L with 2M
	add Res2,R0 ; Add to Result
	adc Res3,R1
	adc Res4,tmp
;
; Multiplication done
;


Simulation shows the following steps. 16 by 16 load Loading the two constants 10000 (hex 2710) and 25000 (hex 61A8) to the registers in the upper register space ...
16 by 16 mult1 Multiplying the two MSBs (hex 27 and 61) and copying the result in R1:R0 to the two most upper result registers R5:R4 ...
16 by 16 mult2 Multiplying the two LSBs (hex 10 and A8) and copying the result in R1:R0 to the two lower result registers R3:R2 ...
16 by 16 mult3 Multiplying the MSB of m1 with the LSB of m2 and adding the result in R1:R0 to the result register's two middle bytes, no carry occurred ...
16 by 16 mult4 Multiplying the LSB of m1 with the MSB of m2 and adding the result in R1:R0 to the result register's two middle bytes, no carry occurred. The result is hex 0EE6B280, which is 250000000 and obviously correct ...
16 by 16 cycles Multiplication needed 19 clock cycles, which is very much faster than with software multiplication. Another advantage here: the required time is ALWAYS exactly 19 cycles, and it doesn't depend on the input numbers (like is the case with software multiplication and on overflow occurances (thanks to our small trick of adding zero with carry). So you can rely on this ...


To the top of that page

4. Hardware multiplication of a 16- by a 24-bit-binary

Registers 16-by-24 The multiplication of a 16 bit binary "a" with a 24 bit binary "b" leads to results with up to 40 bit length. The multiplication scheme requires six 8-by-8-bit multiplications and adding the results to the appropriate position in the result registers. The assembler source code for this:

; Hardware Multiplication 16 by 24 bit
.include "m8def.inc"
;
; Register definitions
.def a1 = R2 ; define 16-bit register
.def a2 = R3
.def b1 = R4 ; define 24-bit register
.def b2 = R5
.def b3 = R6
.def e1 = R7 ; define 40-bit result register
.def e2 = R8
.def e3 = R9
.def e4 = R10
.def e5 = R11
.def c0 = R12 ; help register for adding
.def rl = R16 ; load register
;
; Load constants
.equ a = 10000 ; multiplicator a, hex 2710
.equ b = 1000000 ; multiplicator b, hex 0F4240
	ldi rl,BYTE1(a) ; load a
	mov a1,rl
	ldi rl,BYTE2(a)
	mov a2,rl
	ldi rl,BYTE1(b) ; load b
	mov b1,rl
	ldi rl,BYTE2(b)
	mov b2,rl
	ldi rl,BYTE3(b)
	mov b3,rl
;
; Clear registers
	clr e1 ; clear result registers
	clr e2
	clr e3
	clr e4
	clr e5
	clr c0 ; clear help register
;
; Multiply
	mul a2,b3 ; term 1
	add e4,R0 ; add to result
	adc e5,R1
	mul a2,b2 ; term 2
	add e3,R0
	adc e4,R1
	adc e5,c0 ; (add possible carry)
	mul a2,b1 ; term 3
	add e2,R0
	adc e3,R1
	adc e4,c0
	adc e5,c0
	mul a1,b3 ; term 4
	add e3,R0
	adc e4,R1
	adc e5,c0
	mul a1,b2 ; term 5
	add e2,R0
	adc e3,R1
	adc e4,c0
	adc e5,c0
	mul a1,b1 ; term 6
	add e1,R0
	adc e2,R1
	adc e3,c0
	adc e4,c0
	adc e5,c0
;
; done.
	nop
; Result should be hex 02540BE400

The complete execution requires To the top of that page

2008 by http://www.avr-asm-tutorial.net