You need to handle large numbers, e.g. unsigned integers? This page demonstrates the handling of 64-bit unsigned integer values with an 8-bit AVR.

- The structure of 64 bit numbers in an 8 bit controller
- Converting decimal strings to binary
- Binary adding of two large numbers
- Subtracting a large number from another large number
- Multiplication of a large number by an 8 bit binary
- Listing decimals in binary
- Converting large numbers to a decimal string
- Conclusions

16 bits require two eight bit registers, e.g. the registers R0 and R1. In assembler you have the freedom to place those two bytes to wherever you want and in any row. Whether the most significant byte (MSB) of the two is in R0 or in R1 is up to the assembler programmer. By simple convention we write this either as

Extending the 16 bits to 64 bits is easy: just add more registers. The 64 bits are e.g. stored like this: R7:R6:R5:R4:R3:R2:R1:R0. You can also use R3:R7:R6:R0:R2:R1:R4:R3:R5, if you can remember the row. It is your own choice to make life a little more complicated.

To write understandable code I named R7:R6:R5:R4:R3:R2:R1:R0 as

As the AVRs have plenty registers available (not so in other controllers such as PICs or in the 3GHz monsters in the PC), it is not necessary to store those numbers in SRAM. If you run out of space in registers you can as well copy those to the SRAM of the controller. If your desired location in SRAM is

```
sts sN1,R0 ; Store register R0 on location sN1
sts sN1+1,R1
sts sN1+2,R2
sts sN1+3,R3
sts sN1+4,R4
sts sN1+5,R5
sts sN1+6,R6
sts sN1+7,R7
```

If you want to copy this 64 bit number to a different location, e.g. sN2,
you'll have to copy all that and replace As copying always involves 8 bytesin a row, you can choose another way of doing that: by the use of pointers. Here we use the pointer registers

- Z (R31:R30) to point to the SRAM location
**sN1**, and - X (R27:R26) to point to the registers, starting with R0.

The source code here provides a routine called

```
; Copy N1 to SRAM location
CopyN1ToSramN1:
ldi ZH,High(sN1) ; Point Z to Sram N1, MSB
ldi ZL,Low(sN1) ; dto., LSB
CopyN1ToSram:
clr XH ; Point X to N1, MSB
clr XL ; dto., LSB
CopyN1ToSram1:
ld rmp,X+ ; Read register
st Z+,rmp ; Write to SRAM
cpi XL,8 ; All 8 bytes copied?
brne CopyN1ToSram1
ret
```

That routine is fast: at 1 MHh clock rate it consumes 68 µs
time (including an Where did I get the 68 µs from? Now,

- install the simulator avr_sim on your computer (pre-compiled 64-bit Windows and Linux executables available, otherwise compile your own from the source code provided using Lazarus),
- start avr_sim and open the source code 64bit.asm
by selecting
**Project**and**New from asm**from the menu, - assemble the source code with
**Assemble**, - then in the listing search the line
**rcall CopyN1ToSramN1**in the**Init code**section of the listing, right-click on that line and select**Breakpoint**, - then go to the next line and again set a breakpoint on that line,
- start
**Simulate**, in the new window click on**Run**, - after a while when simulation stops on the first breakpoint line, click
on
**Stopwatch**in the**Simulation status**window and tag**SRAM**in the**Show internal hardware**section, - then let the simulator execute the subroutine by clicking
**Run**, check that the SRAM locations marked in yellow follow exactly the content of R0 to R7 in the**Register**window, - if the simulator stops on the next breakpoint line, read the Stopwatch's time and you know exactly how long it lasted.

Now, the answer is simple: define a string in the source code

```
ANumber1:
```

.db "123,456,789,012,345,678",0 ; A very large first number

This places a string of ASCII characters into flash memory at location
Number conversion starts with clearing N1 (R7:...:R0). Pointer register Z points to the first character at the doubled address of

The '1' is then converted to binary by subtracting 0x30 ('0') from it. Now, the previous N1 has to be multiplied by 10 to add the next digit to it. As N1 was zero, nothing will change with multiplicating by 10. If the next following numbers are read, the multiplication by 10 goes as follows:

- copy the eight bytes of N1 to another location (either to R15:...:R8 or to the SRAM),
- multiply N1 by two (left shifting R0, then rotate left R1 - with the carry bit going into bit 0 of the next byte -, then R2, and so on until R7),
- if the last rotate ends with a carry flag of one, an error has occurred (overflow error, number is too large to fit into 64 bits),
- then again multiply by two, so that N1 is multiplied by four now (check again for an overflow),
- now add the prevously stored number, by using the instruction
**add R0,R8**and**adc R1,R9**and so on to account for any carry flags from the lower byte addition, - if after the last
**adc R7,R15**the carry is set, signal an overflow error, - then multiply the five-fold of the previous number by two again and yield the result of the multiplication by 10.

If the next ASCII character is 0x00, the number is on its end and Dec2BinN1 can return with a cleared carry flag (no error condition).

To simulate the conversion of 123.456.789.012.345.678 to a 64 bit binary just single-step through the code from the beginning. Note that the conversion of that number consumes 2,235 Milliseconds or needs 2,235 instructions to be executed.

Later on in the source code a smaller number is converted. The time consumed is nearly proportional to the number of digits to be converted, because the multiplication by 10 is the most time consuming step.

All very fast for a controller handling 8 bits only.

Adding takes only 16 µs and is very fast.

The source code then copies the result to location sN3, so that the binaries N1 and N2 and the result N3 are all listed in SRAM and can be viewed there when simulating. Place a breakpoint there behind the rcall to stop at this point.

Subtraction is performed in very fast 16 µs.

The result of the subtraction is also written to sN3 in the SRAM.

Then bit by bit is right-shifted out of R16 to the carry flag. If it is a one, the number N2 is added to N1. Error checking is done after the last ADC.

Then it is checked if R16 is already zero (no more to multiplicate). If that is the case, the subroutine resturns with a clear carry flag.

If not, N2 is multiplied by two. Error checking is done after the last ROL, signalling an overflow.

Multiplication with an 8-bit binary of 255 (all bits to be added) needs very fast 205 µs. A hardware multiplicator would not be very much faster than this software multiplication.

The result is also copied to sN3.

Calculation and listing needs 2.991 ms.

This involves a

The math of conversion of the 64 bit binary in N1 goes like this:

- the 64 bit binary representation of the decimal is read from the flash table to N2, if the first byte is 0xFF the end of conversion has been reached,
- it is subtracted from N1 until an underflow occurs,
- the number of successful subtractions is converted to
an ASCII digit
**subi R16,-'0'**adds 0x30 to the digit, - then it is checked if the T flag is set, which signals blanking of leading zeroes, if not set the characters is written directly to the SRAM buffer, if set it is checked whether the digit is still a zero, if yes, a blank is issued, if not, the T flag is cleared and the number is written to SRAM,
- a counter is decreased (initial from two downwards, later from three), if it reaches zero, a thousands separator is added.

If you need the number left-adjusted just copy the X pointer when the T flag is cleared. On the beginning set this pointer to the last digit in SRAM to make sure that a single digit number is covered.

Conversion to an ASCII string requires 2.682 ms for the larger of the two numbers and 2.052 ms for the smaller one.

Assembler provides fast (in the ms range) and efficient code to do all those operations. No need to involve lengthy and space consuming libraries or to change to a 16, 32, 48 or 64 bit controller.

One reason for that efficient code is that the AVRs have plenty of registers and a very efficient instruction set. Without that, the routines would be much longer and would consume very much longer times to execute.

To the top of that page

(©)2019 by http://www.avr-asm-tutorial.net