Path: Home => AVR-Overview => Binary calculations => Fixed decimals    (Diese Seite in Deutsch: Flag DE) Logo
AT90S8515

Conversion to fixed decimal numbers in AVR assembler


Sense and Nonsense of floating decimal fractions

First: Do not use any floating points, unless you really need them. Floating points are resource killers in an AVR, lame ducks und need extreme execution times. Run into this dilemma, if you think assembler is too complicated, and you prefer Basic or other languages like C and Pascal.
Not so, if you use assembler. You'll be shown here, how you can perform the multiplication of a fixed point real number in less than 60 micro-seconds, in special cases even within 18 micro-seconds, at 4 Mcs/s clock frequency. Without any floating point processor extensions and other expensive tricks for people too lazy to use their brain.

How to do that? Back to the roots of math! Most tasks with floating point reals can be done using integer numbers. Integers are easy to program in assembler and perform fast. The decimal point is only in the brain of the programmer, and is added somewhere in the decimal digit stream. No one realizes, that this is a trick.

To the top of that page

Linear conversions

As an example the following task: an 8-Bit-AD-Converter measures an input signal in the range from 0.00 to 2.55 Volt, and returns as the result a binary in the range from $00 and $FF. The result, a voltage, is to be displayed on a LCD display. Silly example, as it is so easy: The binary is converted to a decimal ASCII string between 000 and 255, and just behind the first digit the decimal point has to be inserted. Done!

The electronics world sometimes is more complicated. E.g., the AD-Converter returns an 8-Bit-Hex for input voltages between 0.00 and 5.00 Volt. Now we're tricked and do not know how to proceed. To display the correct result on the LCD we would have to multiply the binary by 500/255, which is 1.9608. This is a silly number, as it is almost 2, but only almost. And we don't want that kind of inaccuracy of 2%, while we have an AD-converter with around 0.25% accuracy.

To cope with this, we multiply the input by 500/255*256 or 501.96 and divide the result by 256. Why first multiply by 256 and then divide by 256? It's just for enhanced accuracy. If we multiply the input by 502 instead of 501.96, the error is just in the order of 0.008%. That is good enough for our AD-converter, we can live with that. And dividing by 256 is an easy task, because it is a well-known power of 2. By dividing with number that are a power of 2, the AVR feels very comfortable and performs very fast. By dividing with 256, the AVR is even faster, because we just have to skip the last byte of the binary number. Not even shift and rotate!

The multiplication of an 8-bit-binary with the 9-bit-binary 502 (hex 1F6) can have a result greater than 16 bits. So we have to reserve 24 bits or 3 registers for the result. During multiplication, the constant 502 has to be shifted left (multiplication by 2) to add these numbers to the result each time a one rolls out of the input number. As this might need eight shifts left, we need futher three bytes for this constant. So we chose the following combination of registers for the multiplication:
NumberValue (example)Register
Input value255R1
Multiplicator502R4:R3:R2
Result128,010R7:R6:R5
After filling the value 502 (00.01.F6) to R4:R3:R2 and clearing the result registers R7:R6:R5 the multiplication goes like this:
  1. Test, if the input number is already zero. If yes, we're done.
  2. If no, one bit of the input number is shifted out of the register to the right, into the carry, while a zero is stuffed into bit 7. This instruction is named Logical-Shight-Right or LSR.
  3. If the bit in carry is a one, we add the multiplicator (during step 1 the value 502, in step 2 it's 1004, a.s.o.) to the result. During adding, we care for any carry (adding R2 to R5 by ADD, adding R3 to R6 and R4 to R7 with the ADC instruction!). If the bit in the carry was a zero, we just don't add the multiplicator to the result and jump to the next step.
  4. Now the multiplicator is multiplied by 2, because the next bit shifted out of the input number is worth double as much. So we shift R2 to the left (by inserting a zero in bit 0) using LSL. Bit 7 is shifted to the carry. Then we rotate this carry into R3, rotating its content left one bit, and bit 7 to the carry. The same with R4.
  5. Now we're done with one digit of the input number, and we proceed with step 1 again.
The result of the multiplication by 502 now is in the result registers R7:R6:R5. If we just ignore register R5 (division by 256), we have our desired result. To enhance occuracy, we can use bit 7 in R5 to round the result. Now we just have to convert the result from its binary form to decimal ASCII (see Conversion bin to decimal-ASCII). If we just add a decimal point in the right place in the ASCII string, our voltage string is ready for the display.

The whole program, from the input number to the resulting ASCII string, requires between 79 and 228 clock cycles, depending from the input number. Those who want to beat this with the floating point routine of a more sophisticated language than assembler, feel free to mail me your conversion time (and program flash and memory usage).

To the top of that page

Example 1: 8-bit-AD-converter with fixed decimal output

The program described above is, a little optimized, available in HTML-form or as assembler source code file. The source has all necessary routines for the conversion in a compact form, to be exported to other programs. The head of the program is a test setting, so you can test the program in the simulator.

To the top of that page

Example 2: 10-bit-AD-converter with fixed decimal output

8-bit-AD-converters are rare, 10 bits are more often used. Because 10 bits are more accurate, the conversion is done to yield a four-digit-decimal. Don't be surprised, if the last digit is not very stable. Here we have the HTML-form and here is the assembler source code file of the program.

Caused by the expanded accuracy, the program needs some more registers. It is a bit slower, but not much. I still keep up with the above offer for Basic, C and Pascal programers, if you beat this.

To the top of that page

Example 3: 10-bit-AD converter with internal reference voltage for voltage measurements

Let's look at a more complex task:

Voltage prescaler

30V prescaler First of all: the voltage prescaler. This has to divide the input voltage of up to 30 V down to the 1.1 V of the reference voltage. This is done with two resistors. The two resistors R1 (connected with ground) and R2 (connected with the input voltage) have to source a current of Imeas = Vref / R1 = (30.0 - Vref) / R2. If we select an R1 of 10 kΩ then R2 (kΩ) = (30 - 1.1) / 1.1 * 10 kΩ = 262.7 kΩ. That sounds like 270 kΩ in the E12 row of resistors, so we'll get the 30.00V as the upper scale end.

With 10 kΩ and 270 kΩ the voltage divider produces, with a measuring current of up to Imeas = 1.1 / 10 kΩ = 0.11 mA, a voltage drop through R2 of up to 29.7 V. With the reference voltage added, the scale end of the divider is at 30.80 V.

Conversion of the AD result to a voltage

If the AD converter sees a voltage of 1.1 Volt on the input, its result shall be 1,024 (more exactly: if the voltage is 1.1 - 1.1 / 1.024, the result would be 1,023). The resulting voltage display with 1,024 should be 30.08V. Forget the dot for a while: it is smuggled in later on. That means we have to multiply the 1,024 by 3.0078 to come to 3,080. We can now add the AD result two times to itself and we have the 3,072, and we can say Good Bye to the small difference of 80 mVs in the fourth digit of the voltage display.

Calculating the dividers But: as our resistors are better than 1%, we might want the result also to be better than 1%. We now multiply the 3.0078 by 256 and yield exactly 770. If we multiply the AD result with that, and if we divide the result by 256 (by simply skipping the last byte of the multiplication result or rather use it for rounding), we'll get the result with an accuracy of 1%.

The table shows the voltage calculation. Depending from the voltage of the input the ADC reads its result by dividing it by the resistor divider and by its reference voltage. We see from the first three entry lines that the ADC produces a difference of three to four for each 0,1 V change of the input voltage. Multiplication with 770 yields a 24-bit binary result, which the table shows in decimal and in hexadecimal format. Now we skip the last byte of the multiplication result and use it for rounding, and get a 16-bit result. If we convert this binary to decimal, we'll get decimals that look much like our desired result. We skip the first digit of the result, if it is zero. And now we smuggle the decimal dot in to the third position of the digits, and add a V to the end of the string, so we'll get the desired output string.

The table demonstrates that only the last digit of the display differs by +/-1 from the correct voltage. That means that our accuracy is better than 1%.

Unfortunately we'll have to perform a 16-bit by 16-bit multiplication, because both the AD results as well as the factor 770 are higher than 255. But the result cannot exceed 0x0C04FE and is 20 bits wide only (three bytes).

Multiplication with the hardware multiplier

If you use an ATmega, the multiplication with the built-in hardware multiplier is straight-forward. But, as both numbers are 16 bit wide, we need to perform the 16-by-16-bit multiplication. Multiplying the number 1 with 256*MSB1 + LSB1 by the number 2 with 256*MSB2 + LSB2 yields the term 65536*MSB1*MSB2 + 256*MSB1*LSB2 + 256*MSB2*LSB1 + LSB1*LSB2, so we need four multiplications. The multiplication by 65,536 and by 256 is done by shifting the result two or one byte to the left and by moving those to or adding those with higher bytes of the result. If the MSB of the result cannot be larger than zero (when multiplying both MSBs) we can skip the MSB.

Hardware multiplication 16*16-bit The picture shows how this is done.
  1. First the two LSBs are muliplied with the instruction MUL. The result is in the register pair R1:R0. Those are copied to the two lowest registers of the result.
  2. Then the two MSBs are multiplied. The LSB of the result in R0 is copied to the third of the result registers (by that multiplying the result by 65,536). The MSB in R1 is zero, we skip that.
  3. The next two steps multiply an MSB of one number with the LSB of the other number. The results in R1:R0 are added to the second and third result register (by that multiplying it with 256). ADD adds the LSB in R0, ADC the MSB in R1 plus the carry.
That is how the source code looks like:

; Constant
.equ cMult = 770
; Registers
.def rAdcL = R2 ; Loaded with the LSB of the ADC result
.def rAdcH = R3 ; dto., MSB
.def rMultL = R4 ; Loaded with the constant, LSB
.def rMultH = R5 ; dto., MSB
.def rRes0 = R6 ; Result byte 0, used for rounding
.def rRes1 = R7 ; dto., 1, LSB result
.def rRes2 = R8 ; dto., 2, MSB result
;
; Loading the constant
  ldi R16,Low(cMult) ; Load constant, LSB
  mov rMultL,R16
  ldi R16,High(cMult) ; dto., MSB
  mov rMultH,R16
;
; Multiplication of the two LSBs
  mul rAdcL,rMultL ; Multiplying the LSBs
  mov rRes0,R0 ; Copy to result, LSB
  mov rRes1,R1 ; dto., MSB
; Multiplication of the two MSBs
  mul rAdcH,rMultH ; Multplying the MSBs
  mov rRes2,R0 ; Copy the 65.536-fold to the result, LSB only
; Multiplication of the LSB with the MSB
  mul rAdcL,rMultH ; Multiplying LSB by MSB
  add rRes1,R0 ; Adding the 256-fold to the result, LSB
  adc rRes2,R1 ; dto., MSB
; Multiplication of the MSB with the LSB
  mul rAdcH,rMultL ; Multiplying MSB with LSB
  add rRes1,R0 ; Adding the 256-fold to the result, LSB
  adc rRes2,R1 ; dto., MSB
  ldi rmp,0x7F ; Round result
  add rRes0,rmp
  brcc ToSram
  ldi rmp,0
  adc rRes1,rmp
  adc rRes2,rmp
ToSram:

The complete multiplication requires only 25 µs at 1 MHz clock, which is very fast.

Multiplying without hardware multiplicator

Multiplying the AD result with 770 is a rather simple task: 770 is simply binary 0b0011.0000.0010. That means we'll have to the 256-fold of the AD result to the AD result, multiply those by two and add the 256-fold of the AD result again.

That means we do not have to perform all the shifting of zeros, because only three ADDs and one left shifting is required. We'll see if this is faster than hardware multiplication with its 25 µs.

The source code goes like this:

  mov rRes0,rAdcL
  mov rRes1,rAdcH
  add rRes1,rAdcL
  mov rRes2,rAdcH
  ldi rmp,0
  adc rRes2,rmp
  lsl rRes0
  rol rRes1
  rol rRes2
  add rRes1,rAdcL
  adc rRes2,rAdcH

The complete operation needs, including final rounding, 17 µs only, so is considerably faster than hardware multiplication. And: any AVR can do that, not only ATmega types. Conclusion: sometimes it is faster to do special multiplication rather than hardware multiplication.

Those who need it more general and for numbers that are not 770 can use the following multiplication algorithm:

; Loading the constant
  ldi R16,Low(cMult) ; Load constant, LSB
  mov rMultL,R16
  ldi R16,High(cMult) ; dto., MSB
  mov rMultH,R16
; Multplication
  clr R0
  clr rRes0
  clr rRes1
  clr rRes2
Shift:
  lsr rMultH
  ror rMultL
  brcc Shift1
  add rRes0,rAdcL
  adc rRes1,rAdcH
  adc rRes2,R0
Shift1:
  lsl rAdcL
  rol rAdcH
  rol R0
  tst rMultL
  brne Shift
  tst rMultH
  brne Shift

What looks rather short, needs a lot of time: 121 µs. Here, the hardware multiplicator can save a lot of µs.

Conversion to the decimal ASCII display string

Now we have, in all cases, the two result bytes in rRes2:rRes1, in binary format. From 0x0C05 now we'll have to do some magic to arrive at 30.77V.

The SRAM is a good place for such character strings. For this we need a pointer. We can make it without as well, but with a pointer it is more elegant.

Now, the decimal conversion does all the same. First we have to subtract 1,000 until an underflow occurs. And we have to count how often this can be done without underflow (here: three times). The 1,000 is finally added again to compensate the underflow step. The result is checked whether it is zero: if so, a blank is added to the string instead of the zero.

Then we subtract 100 from the number, until an underflow occurs. After writing the resulting character to the string we smuggle the decimal dot into the string.

The ten-phase is smaller, because the remaining number is 8 bit wide only.

Even simpler is the last digit: here we add the ASCII-Zero to the remainder. And finally add a V to it.

ToSram:
  ldi ZH,High(sDecVtg) ; Pointer to SRAM, MSB
  ldi ZL,Low(sDecVtg) ; dto., LSB
  ldi rmp,Low(1000) ; Start with 1,000
  mov R0,rmp
  ldi rmp,High(1000)
  mov R1,rmp
  ldi rmp,'0'-1 ; Counter
ToSram1:
  inc rmp
  sub rRes1,R0
  sbc rRes2,R1
  brcc ToSram1
  add rRes1,R0 ; Undo last subtract, LSB
  adc rRes2,R1 ; dto., MSB
  cpi rmp,'0' ; Leading zero?
  brne ToSram2 ; No
  ldi rmp,' '
ToSram2:
  st Z+,rmp ; Write character to SRAM
  ldi rmp,Low(100) ; Continue with 100
  mov R0,rmp
  ldi rmp,High(100)
  mov R1,rmp
  ldi rmp,'0'-1
ToSram3:
  inc rmp
  sub rRes1,R0
  sbc rRes2,R1
  brcc ToSram3
  add rRes1,R0
  adc rRes2,R1
  st Z+,rmp
  ldi rmp,'.' ; Add decimal dot
  st Z+,rmp
  ldi rmp,10 ; Continue with 10
  mov R0,rmp
  ldi rmp,'0'-1
ToSram4:
  inc rmp
  sub rRes1,R0
  brcc ToSram4
  st Z+,rmp
  add rRes1,R0
  ldi rmp,'0' ; Last digit
  add rmp,rRes1
  st Z+,rmp
  ldi rmp,'V'
  st Z,rmp

The complete conversion lasts 93 µs at 1 MHz, which is not too long, too.

The software is as source code available here.

Example 4: A measuring device for +/-15V

Add an additional example from practice: the AD converter measures positive as well as negative voltages.

The prescaler for positive and negative voltages

The +/-15V-presdcaler That is how the prescaler looks like: again the input voltage is divided by the two resistors 270k and 10k, but now a 82k is added. This shifts the voltage on the ADC input up into the positive area. This is done by adding the positive operating voltage of +5 V, as divided by the 82k resistor. This leads to positive voltages on the ADC input pin all over the complete input voltage range.

The negative aspect of this solution now is that a regulated operating voltage is required: at 3.3 V the 82k has to be smaller, at 3.0 V even much smaller. This now is not that flexible any more, but has to be fixed. In no case the ADC input should get negative, because this would sustainably destroy the pin.

ADC results over the whole +/-15V-input voltage range That is what the ADC delivers as result for input voltages between -15 V and +15 V. One can see, that at 0.00 V the ADC does not produce 512, but a little bit less, 490. This results from the fact that at 0.00 V some small current flows through the 270k from the ADC voltage into the input. That makes the difference of 22. The red line for the 0.00 V input voltage does not land at the 512, but a few digits below.

Those who need different input voltage ranges, operating voltages and combinations of resistors can play with the LibreOffice-Calc spreadsheet
here.

Conversion of the ADC results to voltage strings

From the measured ADC results, 490 has to be subtracted. If that ends This is simple, if we can do it like that. Only one additional resistor, a little bit struggle with the sign character and we are able to measure positive as well as negative voltages. And that rather exact (with +/-10 mV resolution).

Conclusions

Those who want to perform this here in C and with the float library now will get lots of fun: execution times much longer than 100-fold as well as lots of further resources (lots of flash memory - doesn't fit into an ATtiny any more -, several registers, SRAM) are eaten up. Execution times and those further arguments advocate to increase the brain's efforts a little bit and to perform this with preudo-floats like shown here.

©2003-2022 by http://www.avr-asm-tutorial.net