Path: Home => AVR-EN => Assembler introduction => Floats => Mantissa multiplication    (Diese Seite in Deutsch: Flag DE) Logo

Beginner's introduction to AVR assembler language

Multiplying mantissas of floating point numbers in assembler language

Floating point multiplication: yes, if necessary

To multiply two floating point numbers we have to
  1. add the two exponents, and
  2. to multiply the two mantissas.
While the first part is rather simple (just add the two signed bytes), the second part is a little more complicated (if your mantissas have some more than only eight bit lengthes).

To demonstrate that floats are not as easy to handle like integers, I show here the multiplication of two 56-bit-float mantissas (64 bit float with an 8-bit exponent and a 56-bit mantissa).

At the end, I am sure, you are convinced that using floats is not a good and effective idea.

Floating point mantissas, 56 bits

Two mantissas to be multiplied Here we see two 56 bits wide mantissas that need to be multiplied. As you already learned: mantissa bits are starting with 0.5 decimal, then 0.25 decimal, etc. The more to the right, the less value, each bit is half the value of the one to the left. I therefore numbered the bits here as "2-N".

If we multiply those two mantissas, we'll get a similar result. It is clear that the bits to the right contribute less value to the result. If we multiply 2-56 (decimal ca. 1E-17) with 2-32 we'll get a result of 2-88 (decimal roughly 1E-27), which does not contribute too much to our 56-bit result value. To be able to handle some more bits than needed, we'll add 16 bits to the right, and use those extra bits for later rounding.

I named the bytes of the two mantissas as Z1N and Z2N, where N is 1 to 7. I named the bytes of the result mantissa as ResN, where N is from 1 to 9 (seven plus two extra bytes for rounding).

To multiply those mantissas in assembly language, I added the places for the 7 + 7 + 9 bytes in the AVR's registers. I used the adverse rowing: the lower the register the higher the value. And note that N starts with 1, not with 0. You can as well pack them into SRAM locations, but that adds lots of additional instructions and increases execution times by far.

The math of multiplication

Now we have seven bytes, which are 2-8 smaller than the one to the left of it.

If we want to multiply those, we'll have to consider the byte's exponential multiplicators: 2-8, 2-16, etc. The bytes, multiplied with their exponential multiplicator, sum up to the numbers Z1 and Z2.

As each number consists of seven bytes, together with their exponential multiplicators, we'll have to multiply each bytes of Z1 with each byte of Z2.

Multiplying the mantissas Here are the formulas for multiplying the two brackets. As can be seen, the terms Z1N * Z2N are all to be multiplied. As their exponential multiplicators can be multiplied by adding the exponent, numbers between 2-16 and 2-112 result. The parts that have a red background do neither contribute to the result nor to the rounding and can be skipped.

The result is the sum of all those multiplied numbers. To initialize this sum, we could simply clear those bytes to zero. But by using the yellow-highlighted multiplications, we can skip this step by setting the appropriate result registers to the multiplication results of the binary numbers. Only Res9 is a little bit more complicated, because it is an MSB without an LSB.

Multiplying Z1N with Z2N uses the built-in hardware multiplicator. The MUL instruction gets the two registers and delivers its result in the register pair R1:R0. This result has to be added to the result registers at the appropriate location. As there can be carries when adding, those carries have to triggle down the register line.

The assembler source code

The assembler source code (here in .asm format) is simple. The 56-bit numbers Z1 and Z2 can be defined as constants on top of the code, and are written to the proper registers at the beginning.

Note that the code does neither use subroutines or macros nor branches, so is rather straight forward. When adding the multiplication result to the result registers, I used an ADC with zero in rmp to care for any carries. Branching on carry clear would increase instruction count, but decrease execution time, depending from the numbers.

The 286 instructions of the code, of which the first 28 set the constants, consume 321 µs at a clock frequency of 1 MHz. Execution time is independant from the numbers used.

The code:

;
; ****************************************
; * Multiply two 56-bit float mantissas  *
; * with a hardware multiplier, here m48 *
; * (C)2022 by Gerhard Schmidt           *
; ****************************************
;
.nolist
.include "m48adef.inc" ; Define device ATmega48A
.list
;
; Multiplies two 56-bit float mantissas by using
; the hardware multiplier
;
; Test numbers
.equ Z1 = 0x7FFFFFFFFFFFFF
.equ Z2 = 0x7FFFFFFFFFFFFF
;
; **********************************
;       R E G I S T E R S
; **********************************
;
; Used: R1:R0 for multiplication
.def rZ11 = R2 ; Z1, byte 1
.def rZ12 = R3 ; dto., byte 2
.def rZ13 = R4 ; dto., byte 3
.def rZ14 = R5 ; dto., byte 4
.def rZ15 = R6 ; dto., byte 5
.def rZ16 = R7 ; dto., byte 6
.def rZ17 = R8 ; dto., byte 7
.def rZ21 = R9 ; Z2, byte 1
.def rZ22 = R10 ; dto., byte 2
.def rZ23 = R11 ; dto., byte 3
.def rZ24 = R12 ; dto., byte 4
.def rZ25 = R13 ; dto., byte 5
.def rZ26 = R14 ; dto., byte 6
.def rZ27 = R15 ; dto., byte 7
.def rmp = R16 ; Define multipurpose register
.def rDummy = R17 ; Not used
.def rRes1 = R18 ; Result, byte 1
.def rRes2 = R19 ; dto., byte 2
.def rRes3 = R20 ; dto., byte 3
.def rRes4 = R21 ; dto., byte 4
.def rRes5 = R22 ; dto., byte 5
.def rRes6 = R23 ; dto., byte 6
.def rRes7 = R24 ; dto., byte 7
.def rRes8 = R25 ; dto., byte 8, rounding MSB
.def rRes9 = R26 ; dto., byte 9, rounding LSB
; free: R27 to R31
;
.cseg
.org 000000
; **********************************
;  M A I N   P R O G R A M   I N I T
; **********************************
;
Main:
; Init the mantissas
  ldi rmp,Byte3(Z1 / (1<<32))
  mov rZ11,rmp
  ldi rmp,Byte2(Z1 / (1<<32))
  mov rZ12,rmp
  ldi rmp,Byte1(Z1 / (1<<32))
  mov rZ13,rmp
  ldi rmp,Byte4(Z1)
  mov rZ14,rmp
  ldi rmp,Byte3(Z1)
  mov rZ15,rmp
  ldi rmp,Byte2(Z1)
  mov rZ16,rmp
  ldi rmp,Byte1(Z1)
  mov rZ17,rmp
  ldi rmp,Byte3(Z2 / (1<<32))
  mov rZ21,rmp
  ldi rmp,Byte2(Z2 / (1<<32))
  mov rZ22,rmp
  ldi rmp,Byte1(Z2 / (1<<32))
  mov rZ23,rmp
  ldi rmp,Byte4(Z2)
  mov rZ24,rmp
  ldi rmp,Byte3(Z2)
  mov rZ25,rmp
  ldi rmp,Byte2(Z2)
  mov rZ26,rmp
  ldi rmp,Byte1(Z2)
  mov rZ27,rmp
;
; Multiply the mantissas
  mul rZ11,rZ21 ; Init the result registers
  mov rRes2,R0
  mov rRes1,R1
  mul rZ13,rZ21
  mov rRes4,R0
  mov rRes3,R1
  mul rZ15,rZ21
  mov rRes6,R0
  mov rRes5,R1
  mul rZ17,rZ21
  mov rRes8,R0
  mov rRes7,R1
  mul rZ13,rZ27
  mov rRes9,R1
  ; Multiply Z11
  clr rmp ; Adder for carry
  mul rZ11,rZ22
  add rRes3,R0
  adc rRes2,R1
  adc rRes1,rmp
  mul rZ11,rZ23
  add rRes4,R0
  adc rRes3,R1
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ11,rZ24
  add rRes5,R0
  adc rRes4,R1
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ11,rZ25
  add rRes6,R0
  adc rRes5,R1
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ11,rZ26
  add rRes7,R0
  adc rRes6,R1
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ11,rZ27
  add rRes8,R0
  adc rRes7,R1
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  ; Multiply rZ12
  mul rZ12,rZ21
  add rRes3,R0
  adc rRes2,R1
  adc rRes1,rmp
  mul rZ12,rZ22
  add rRes4,R0
  adc rRes3,R1
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ12,rZ23
  add rRes5,R0
  adc rRes4,R1
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ12,rZ24
  add rRes6,R0
  adc rRes5,R1
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ12,rZ25
  add rRes7,R0
  adc rRes6,R1
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ12,rZ26
  add rRes8,R0
  adc rRes7,R1
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ12,rZ27
  add rRes9,R0
  adc rRes8,R1
  adc rRes7,rmp
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  ; Multiply Z13
  mul rZ13,rZ22
  add rRes5,R0
  adc rRes4,R1
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ13,rZ23
  add rRes6,R0
  adc rRes5,R1
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ13,rZ24
  add rRes7,R0
  adc rRes6,R1
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ13,rZ25
  add rRes8,R0
  adc rRes7,R1
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ13,rZ26
  add rRes9,R0
  adc rRes8,R1
  adc rRes7,rmp
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  ; Multiply Z14
  mul rZ14,rZ21
  add rRes5,R0
  adc rRes4,R1
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ14,rZ22
  add rRes6,R0
  adc rRes5,R1
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ14,rZ23
  add rRes7,R0
  adc rRes6,R1
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ14,rZ24
  add rRes8,R0
  adc rRes7,R1
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ14,rZ25
  add rRes9,R0
  adc rRes8,R1
  adc rRes7,rmp
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  ; Multiply Z15
  mul rZ15,rZ22
  add rRes7,R0
  adc rRes6,R1
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ15,rZ23
  add rRes8,R0
  adc rRes7,R1
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ15,rZ24
  add rRes9,R0
  adc rRes8,R1
  adc rRes7,rmp
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  ; Multiply Z16
  mul rZ16,rZ21
  add rRes7,R0
  adc rRes6,R1
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ16,rZ22
  add rRes8,R0
  adc rRes7,R1
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  mul rZ16,rZ23
  add rRes9,R0
  adc rRes8,R1
  adc rRes7,rmp
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  ; Multiply Z17
  mul rZ17,rZ22
  add rRes9,R0
  adc rRes8,R1
  adc rRes7,rmp
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
  ; Rounding to 56 bits length
  ldi rmp,0x80
  add rRes9,rmp
  adc rRes8,rmp
  ldi rmp,0
  adc rRes7,rmp
  adc rRes6,rmp
  adc rRes5,rmp
  adc rRes4,rmp
  adc rRes3,rmp
  adc rRes2,rmp
  adc rRes1,rmp
Loop:
	rjmp loop
;
; End of source code
;
; Copyright information
  .db "(C)2022 by Gerhard Schmidt  " ; Source code readable
  .db "C(2)20 2ybG reahdrS hcimtd  " ; Machine code format

Conclusion
Multiplication of two 56-bit mantissas consumes more than 300 µs and requires more than 250 instructions. If you want to waste your controller's time and if you really need 15-decimal-digits-accuracy with your numbers: use 64-bit-floats and feel happy.

All others: hands off, switch your brain on and use some kind of preudo-float math instead.

To the page top

©2022 by http://www.avr-asm-tutorial.net