Not so, if you use assembler. You'll be shown here, how you can perform the multiplication of a fixed point real number in less than 60 micro-seconds, in special cases even within 18 micro-seconds, at 4 Mcs/s clock frequency. Without any floating point processor extensions and other expensive tricks for people too lazy to use their brain.

How to do that? Back to the roots of math! Most tasks with floating point reals can be done using integer numbers. Integers are easy to program in assembler and perform fast. The decimal point is only in the brain of the programmer, and is added somewhere in the decimal digit stream. No one realizes, that this is a trick.

To the top of that page

The electronics world sometimes is more complicated. E.g., the AD-Converter returns an 8-Bit-Hex for input voltages between 0.00 and 5.00 Volt. Now we're tricked and do not know how to proceed. To display the correct result on the LCD we would have to multiply the binary by 500/255, which is 1.9608. This is a silly number, as it is almost 2, but only almost. And we don't want that kind of inaccuracy of 2%, while we have an AD-converter with around 0.25% accuracy.

To cope with this, we multiply the input by 500/255*256 or 501.96 and divide the result by 256. Why first multiply by 256 and then divide by 256? It's just for enhanced accuracy. If we multiply the input by 502 instead of 501.96, the error is just in the order of 0.008%. That is good enough for our AD-converter, we can live with that. And dividing by 256 is an easy task, because it is a well-known power of 2. By dividing with number that are a power of 2, the AVR feels very comfortable and performs very fast. By dividing with 256, the AVR is even faster, because we just have to skip the last byte of the binary number. Not even shift and rotate!

The multiplication of an 8-bit-binary with the 9-bit-binary 502 (hex 1F6) can have a result greater than 16 bits. So we have to reserve 24 bits or 3 registers for the result. During multiplication, the constant 502 has to be shifted left (multiplication by 2) to add these numbers to the result each time a one rolls out of the input number. As this might need eight shifts left, we need futher three bytes for this constant. So we chose the following combination of registers for the multiplication:

Number | Value (example) | Register |
---|---|---|

Input value | 255 | R1 |

Multiplicator | 502 | R4:R3:R2 |

Result | 128,010 | R7:R6:R5 |

- Test, if the input number is already zero. If yes, we're done.
- If no, one bit of the input number is shifted out of the register to the right, into the carry, while a zero is stuffed into bit 7. This instruction is named Logical-Shight-Right or LSR.
- If the bit in carry is a one, we add the multiplicator (during step 1 the value 502, in step 2 it's 1004, a.s.o.) to the result. During adding, we care for any carry (adding R2 to R5 by ADD, adding R3 to R6 and R4 to R7 with the ADC instruction!). If the bit in the carry was a zero, we just don't add the multiplicator to the result and jump to the next step.
- Now the multiplicator is multiplied by 2, because the next bit shifted out of the input number is worth double as much. So we shift R2 to the left (by inserting a zero in bit 0) using LSL. Bit 7 is shifted to the carry. Then we rotate this carry into R3, rotating its content left one bit, and bit 7 to the carry. The same with R4.
- Now we're done with one digit of the input number, and we proceed with step 1 again.

The whole program, from the input number to the resulting ASCII string, requires between 79 and 228 clock cycles, depending from the input number. Those who want to beat this with the floating point routine of a more sophisticated language than assembler, feel free to mail me your conversion time (and program flash and memory usage).

To the top of that page

To the top of that page

Caused by the expanded accuracy, the program needs some more registers. It is a bit slower, but not much. I still keep up with the above offer for Basic, C and Pascal programers, if you beat this.

To the top of that page

- A voltage divider divides an input voltage of up to 30 V down to a lower level for measurement with an ADC.
- The AD converter has the internal voltage reference of 1.1 V turned on. (If your AVR provides a 2.56 V reference voltage only: just read on and adjust some parameters.)
- The voltage is to be displayed as a string with 0.00V to 30.00V.

With 10 kΩ and 270 kΩ the voltage divider produces, with a measuring current of up to I

But: as our resistors are better than 1%, we might want the result also to be better than 1%. We now multiply the 3.0078 by 256 and yield exactly 770. If we multiply the AD result with that, and if we divide the result by 256 (by simply skipping the last byte of the multiplication result or rather use it for rounding), we'll get the result with an accuracy of 1%.

The table shows the voltage calculation. Depending from the voltage of the input the ADC reads its result by dividing it by the resistor divider and by its reference voltage. We see from the first three entry lines that the ADC produces a difference of three to four for each 0,1 V change of the input voltage. Multiplication with 770 yields a 24-bit binary result, which the table shows in decimal and in hexadecimal format. Now we skip the last byte of the multiplication result and use it for rounding, and get a 16-bit result. If we convert this binary to decimal, we'll get decimals that look much like our desired result. We skip the first digit of the result, if it is zero. And now we smuggle the decimal dot in to the third position of the digits, and add a V to the end of the string, so we'll get the desired output string.

The table demonstrates that only the last digit of the display differs by +/-1 from the correct voltage. That means that our accuracy is better than 1%.

Unfortunately we'll have to perform a 16-bit by 16-bit multiplication, because both the AD results as well as the factor 770 are higher than 255. But the result cannot exceed 0x0C04FE and is 20 bits wide only (three bytes).

The picture shows how this is done.

- First the two LSBs are muliplied with the instruction MUL. The result is in the register pair R1:R0. Those are copied to the two lowest registers of the result.
- Then the two MSBs are multiplied. The LSB of the result in R0 is copied to the third of the result registers (by that multiplying the result by 65,536). The MSB in R1 is zero, we skip that.
- The next two steps multiply an MSB of one number with the LSB of the other number. The results in R1:R0 are added to the second and third result register (by that multiplying it with 256). ADD adds the LSB in R0, ADC the MSB in R1 plus the carry.

```
; Constant
.equ cMult = 770
; Registers
.def rAdcL = R2 ; Loaded with the LSB of the ADC result
.def rAdcH = R3 ; dto., MSB
.def rMultL = R4 ; Loaded with the constant, LSB
.def rMultH = R5 ; dto., MSB
.def rRes0 = R6 ; Result byte 0, used for rounding
.def rRes1 = R7 ; dto., 1, LSB result
.def rRes2 = R8 ; dto., 2, MSB result
;
; Loading the constant
ldi R16,Low(cMult) ; Load constant, LSB
mov rMultL,R16
ldi R16,High(cMult) ; dto., MSB
mov rMultH,R16
;
; Multiplication of the two LSBs
mul rAdcL,rMultL ; Multiplying the LSBs
mov rRes0,R0 ; Copy to result, LSB
mov rRes1,R1 ; dto., MSB
; Multiplication of the two MSBs
mul rAdcH,rMultH ; Multplying the MSBs
mov rRes2,R0 ; Copy the 65.536-fold to the result, LSB only
; Multiplication of the LSB with the MSB
mul rAdcL,rMultH ; Multiplying LSB by MSB
add rRes1,R0 ; Adding the 256-fold to the result, LSB
adc rRes2,R1 ; dto., MSB
; Multiplication of the MSB with the LSB
mul rAdcH,rMultL ; Multiplying MSB with LSB
add rRes1,R0 ; Adding the 256-fold to the result, LSB
adc rRes2,R1 ; dto., MSB
ldi rmp,0x7F ; Round result
add rRes0,rmp
brcc ToSram
ldi rmp,0
adc rRes1,rmp
adc rRes2,rmp
ToSram:
```

The complete multiplication requires only 25 µs at 1 MHz
clock, which is very fast.
That means we do not have to perform all the shifting of zeros, because only three ADDs and one left shifting is required. We'll see if this is faster than hardware multiplication with its 25 µs.

The source code goes like this:

```
mov rRes0,rAdcL
mov rRes1,rAdcH
add rRes1,rAdcL
mov rRes2,rAdcH
ldi rmp,0
adc rRes2,rmp
lsl rRes0
rol rRes1
rol rRes2
add rRes1,rAdcL
adc rRes2,rAdcH
```

The complete operation needs, including final rounding, 17 µs only,
so is considerably faster than hardware multiplication. And: any AVR can do
that, not only ATmega types. Conclusion: sometimes it is faster to do special
multiplication rather than hardware multiplication.
Those who need it more general and for numbers that are not 770 can use the following multiplication algorithm:

```
; Loading the constant
ldi R16,Low(cMult) ; Load constant, LSB
mov rMultL,R16
ldi R16,High(cMult) ; dto., MSB
mov rMultH,R16
; Multplication
clr R0
clr rRes0
clr rRes1
clr rRes2
Shift:
lsr rMultH
ror rMultL
brcc Shift1
add rRes0,rAdcL
adc rRes1,rAdcH
adc rRes2,R0
Shift1:
lsl rAdcL
rol rAdcH
rol R0
tst rMultL
brne Shift
tst rMultH
brne Shift
```

What looks rather short, needs a lot of time: 121 µs. Here, the
hardware multiplicator can save a lot of µs.
The SRAM is a good place for such character strings. For this we need a pointer. We can make it without as well, but with a pointer it is more elegant.

Now, the decimal conversion does all the same. First we have to subtract 1,000 until an underflow occurs. And we have to count how often this can be done without underflow (here: three times). The 1,000 is finally added again to compensate the underflow step. The result is checked whether it is zero: if so, a blank is added to the string instead of the zero.

Then we subtract 100 from the number, until an underflow occurs. After writing the resulting character to the string we smuggle the decimal dot into the string.

The ten-phase is smaller, because the remaining number is 8 bit wide only.

Even simpler is the last digit: here we add the ASCII-Zero to the remainder. And finally add a V to it.

```
ToSram:
ldi ZH,High(sDecVtg) ; Pointer to SRAM, MSB
ldi ZL,Low(sDecVtg) ; dto., LSB
ldi rmp,Low(1000) ; Start with 1,000
mov R0,rmp
ldi rmp,High(1000)
mov R1,rmp
ldi rmp,'0'-1 ; Counter
ToSram1:
inc rmp
sub rRes1,R0
sbc rRes2,R1
brcc ToSram1
add rRes1,R0 ; Undo last subtract, LSB
adc rRes2,R1 ; dto., MSB
cpi rmp,'0' ; Leading zero?
brne ToSram2 ; No
ldi rmp,' '
ToSram2:
st Z+,rmp ; Write character to SRAM
ldi rmp,Low(100) ; Continue with 100
mov R0,rmp
ldi rmp,High(100)
mov R1,rmp
ldi rmp,'0'-1
ToSram3:
inc rmp
sub rRes1,R0
sbc rRes2,R1
brcc ToSram3
add rRes1,R0
adc rRes2,R1
st Z+,rmp
ldi rmp,'.' ; Add decimal dot
st Z+,rmp
ldi rmp,10 ; Continue with 10
mov R0,rmp
ldi rmp,'0'-1
ToSram4:
inc rmp
sub rRes1,R0
brcc ToSram4
st Z+,rmp
add rRes1,R0
ldi rmp,'0' ; Last digit
add rmp,rRes1
st Z+,rmp
ldi rmp,'V'
st Z,rmp
```

The complete conversion lasts 93 µs at 1 MHz, which is
not too long, too.
The software is as source code available here.

The negative aspect of this solution now is that a regulated operating voltage is required: at 3.3 V the 82k has to be smaller, at 3.0 V even much smaller. This now is not that flexible any more, but has to be fixed. In no case the ADC input should get negative, because this would sustainably destroy the pin.

That is what the ADC delivers as result for input voltages between -15 V and +15 V. One can see, that at 0.00 V the ADC does not produce 512, but a little bit less, 490. This results from the fact that at 0.00 V some small current flows through the 270k from the ADC voltage into the input. That makes the difference of 22. The red line for the 0.00 V input voltage does not land at the 512, but a few digits below.

Those who need different input voltage ranges, operating voltages and combinations of resistors can play with the LibreOffice-Calc spreadsheet here.

- with the zero flag set, the string gets "0.00",
- with a clear carry flag, this difference has to be multiplied
with 1,500 / 446 = 3.3632287, to come to the 15,00 V to be
displayed (better: 15.00V). As multiplication with that number
is not very exact, we'll use the 256-fold, which is 861. Exactly
that would be 860.987, but the difference between this and 861
is only 0.00156% higher. And that is 1,000-fold smaller than the
1% difference caused by the resistor's error range. From the
24-bit result of (N
_{ADC}- 490) * 861 we again use only the upper 16 bits, while the lower 8 bits are only used for rounding. If we convert the binary result to decimal, smuggle the dot into this decimal and add a V we end with a string of "xx.xxV", - with the carry flag set, the resulting voltage is in any case negative and starts with a "-". By applying the instruction NEG the LSB and the MSB are turned into positive numbers and are multiplied by the same factor. Here, we again use the lower 8 bits for rounding, and convert the binary to a decimal string with the smuggled in ".", a preceeding "-" and a following V.

©2003-2022 by http://www.avr-asm-tutorial.net