﻿ Multiplying mantissas of floating point numbers in AVR assembler Path: Home => AVR-EN => Assembler introduction => Floats => Mantissa multiplication    (Diese Seite in Deutsch: )

# Beginner's introduction to AVR assembler language

## Multiplying mantissas of floating point numbers in assembler language

### Floating point multiplication: yes, if necessary

To multiply two floating point numbers we have to
1. add the two exponents, and
2. to multiply the two mantissas.
While the first part is rather simple (just add the two signed bytes), the second part is a little more complicated (if your mantissas have some more than only eight bit lengthes).

To demonstrate that floats are not as easy to handle like integers, I show here the multiplication of two 56-bit-float mantissas (64 bit float with an 8-bit exponent and a 56-bit mantissa).

At the end, I am sure, you are convinced that using floats is not a good and effective idea.

### Floating point mantissas, 56 bits

Here we see two 56 bits wide mantissas that need to be multiplied. As you already learned: mantissa bits are starting with 0.5 decimal, then 0.25 decimal, etc. The more to the right, the less value, each bit is half the value of the one to the left. I therefore numbered the bits here as "2-N".

If we multiply those two mantissas, we'll get a similar result. It is clear that the bits to the right contribute less value to the result. If we multiply 2-56 (decimal ca. 1E-17) with 2-32 we'll get a result of 2-88 (decimal roughly 1E-27), which does not contribute too much to our 56-bit result value. To be able to handle some more bits than needed, we'll add 16 bits to the right, and use those extra bits for later rounding.

I named the bytes of the two mantissas as Z1N and Z2N, where N is 1 to 7. I named the bytes of the result mantissa as ResN, where N is from 1 to 9 (seven plus two extra bytes for rounding).

To multiply those mantissas in assembly language, I added the places for the 7 + 7 + 9 bytes in the AVR's registers. I used the adverse rowing: the lower the register the higher the value. And note that N starts with 1, not with 0. You can as well pack them into SRAM locations, but that adds lots of additional instructions and increases execution times by far.

### The math of multiplication

Now we have seven bytes, which are 2-8 smaller than the one to the left of it.

If we want to multiply those, we'll have to consider the byte's exponential multiplicators: 2-8, 2-16, etc. The bytes, multiplied with their exponential multiplicator, sum up to the numbers Z1 and Z2.

As each number consists of seven bytes, together with their exponential multiplicators, we'll have to multiply each bytes of Z1 with each byte of Z2.

Here are the formulas for multiplying the two brackets. As can be seen, the terms Z1N * Z2N are all to be multiplied. As their exponential multiplicators can be multiplied by adding the exponent, numbers between 2-16 and 2-112 result. The parts that have a red background do neither contribute to the result nor to the rounding and can be skipped.

The result is the sum of all those multiplied numbers. To initialize this sum, we could simply clear those bytes to zero. But by using the yellow-highlighted multiplications, we can skip this step by setting the appropriate result registers to the multiplication results of the binary numbers. Only Res9 is a little bit more complicated, because it is an MSB without an LSB.

Multiplying Z1N with Z2N uses the built-in hardware multiplicator. The MUL instruction gets the two registers and delivers its result in the register pair R1:R0. This result has to be added to the result registers at the appropriate location. As there can be carries when adding, those carries have to triggle down the register line.

### The assembler source code

The assembler source code (here in .asm format) is simple. The 56-bit numbers Z1 and Z2 can be defined as constants on top of the code, and are written to the proper registers at the beginning.

Note that the code does neither use subroutines or macros nor branches, so is rather straight forward. When adding the multiplication result to the result registers, I used an ADC with zero in rmp to care for any carries. Branching on carry clear would increase instruction count, but decrease execution time, depending from the numbers.

The 286 instructions of the code, of which the first 28 set the constants, consume 321 µs at a clock frequency of 1 MHz. Execution time is independant from the numbers used.

The code:
``````
;
; ****************************************
; * Multiply two 56-bit float mantissas  *
; * with a hardware multiplier, here m48 *
; * (C)2022 by Gerhard Schmidt           *
; ****************************************
;
.nolist
.include "m48adef.inc" ; Define device ATmega48A
.list
;
; Multiplies two 56-bit float mantissas by using
; the hardware multiplier
;
; Test numbers
.equ Z1 = 0x7FFFFFFFFFFFFF
.equ Z2 = 0x7FFFFFFFFFFFFF
;
; **********************************
;       R E G I S T E R S
; **********************************
;
; Used: R1:R0 for multiplication
.def rZ11 = R2 ; Z1, byte 1
.def rZ12 = R3 ; dto., byte 2
.def rZ13 = R4 ; dto., byte 3
.def rZ14 = R5 ; dto., byte 4
.def rZ15 = R6 ; dto., byte 5
.def rZ16 = R7 ; dto., byte 6
.def rZ17 = R8 ; dto., byte 7
.def rZ21 = R9 ; Z2, byte 1
.def rZ22 = R10 ; dto., byte 2
.def rZ23 = R11 ; dto., byte 3
.def rZ24 = R12 ; dto., byte 4
.def rZ25 = R13 ; dto., byte 5
.def rZ26 = R14 ; dto., byte 6
.def rZ27 = R15 ; dto., byte 7
.def rmp = R16 ; Define multipurpose register
.def rDummy = R17 ; Not used
.def rRes1 = R18 ; Result, byte 1
.def rRes2 = R19 ; dto., byte 2
.def rRes3 = R20 ; dto., byte 3
.def rRes4 = R21 ; dto., byte 4
.def rRes5 = R22 ; dto., byte 5
.def rRes6 = R23 ; dto., byte 6
.def rRes7 = R24 ; dto., byte 7
.def rRes8 = R25 ; dto., byte 8, rounding MSB
.def rRes9 = R26 ; dto., byte 9, rounding LSB
; free: R27 to R31
;
.cseg
.org 000000
; **********************************
;  M A I N   P R O G R A M   I N I T
; **********************************
;
Main:
; Init the mantissas
ldi rmp,Byte3(Z1 / (1<<32))
mov rZ11,rmp
ldi rmp,Byte2(Z1 / (1<<32))
mov rZ12,rmp
ldi rmp,Byte1(Z1 / (1<<32))
mov rZ13,rmp
ldi rmp,Byte4(Z1)
mov rZ14,rmp
ldi rmp,Byte3(Z1)
mov rZ15,rmp
ldi rmp,Byte2(Z1)
mov rZ16,rmp
ldi rmp,Byte1(Z1)
mov rZ17,rmp
ldi rmp,Byte3(Z2 / (1<<32))
mov rZ21,rmp
ldi rmp,Byte2(Z2 / (1<<32))
mov rZ22,rmp
ldi rmp,Byte1(Z2 / (1<<32))
mov rZ23,rmp
ldi rmp,Byte4(Z2)
mov rZ24,rmp
ldi rmp,Byte3(Z2)
mov rZ25,rmp
ldi rmp,Byte2(Z2)
mov rZ26,rmp
ldi rmp,Byte1(Z2)
mov rZ27,rmp
;
; Multiply the mantissas
mul rZ11,rZ21 ; Init the result registers
mov rRes2,R0
mov rRes1,R1
mul rZ13,rZ21
mov rRes4,R0
mov rRes3,R1
mul rZ15,rZ21
mov rRes6,R0
mov rRes5,R1
mul rZ17,rZ21
mov rRes8,R0
mov rRes7,R1
mul rZ13,rZ27
mov rRes9,R1
; Multiply Z11
clr rmp ; Adder for carry
mul rZ11,rZ22
mul rZ11,rZ23
mul rZ11,rZ24
mul rZ11,rZ25
mul rZ11,rZ26
mul rZ11,rZ27
; Multiply rZ12
mul rZ12,rZ21
mul rZ12,rZ22
mul rZ12,rZ23
mul rZ12,rZ24
mul rZ12,rZ25
mul rZ12,rZ26
mul rZ12,rZ27
; Multiply Z13
mul rZ13,rZ22
mul rZ13,rZ23
mul rZ13,rZ24
mul rZ13,rZ25
mul rZ13,rZ26
; Multiply Z14
mul rZ14,rZ21
mul rZ14,rZ22
mul rZ14,rZ23
mul rZ14,rZ24
mul rZ14,rZ25
; Multiply Z15
mul rZ15,rZ22
mul rZ15,rZ23
mul rZ15,rZ24
; Multiply Z16
mul rZ16,rZ21
mul rZ16,rZ22
mul rZ16,rZ23
; Multiply Z17
mul rZ17,rZ22
; Rounding to 56 bits length
ldi rmp,0x80
ldi rmp,0
Loop:
rjmp loop
;
; End of source code
;