Path: Home => AVR overview => Time loops => with a 24 bit subi loop Logo
Timing loop

Time loops with 24 bit and more registers in AVR assembler based on SUBI

The following describes a different timing loop with 24 bits. This demonstrates that the same task can be solved with very different means in assembler. This solution here is shorter, easier to understand and simpler to calculate as the previous one. The code is then expanded to 64 bits.

The source code of the delay loop is like this:

.equ c = 12345 ; A constant for counting down
;
.def rC1 = R16 ; Three registers to count down
.def rC2 = R17 ; The second one
.def rC3 = R18 ; The third one
;
Main: ; The main code starts here
  sbi DDRB,PORTB0 ; Make PB0 an output
RestartCount: ; The counter loop starts here
  ldi rC1,Byte1(c-1) ; Load LSB of the counter value
  ldi rC2,Byte2(c-1) ; Load middle of the counter value
  ldi rC3,Byte3(c-1) ; Load MSB of the counter value
CountDown:
  subi rC1,1 ; Sets the carry flag if previously zero
  brcc CountDown ; If not yet carry continue couting
  subi rC2,1 ; Downcount the middle byte
  brcc CountDown ; If not yet carry count on
  subi rC3,1 ; Downcount the MSB byte
  brcc CountDown ; If not yet carry count on
  sbi PINB,PINB0 ; Toggle PB0 state
  rjmp RestartCount ; Start new count cycle

The SUBI instruction sets the carry flag when the subtraction underflows from zero to 0xFF. This is the relevant signal to subtract a one from the next upper byte. The down-count cycle ends when all three bytes are 0xFF, so there is one additional cycle absolved, as compared with a zero recognition. This extra cycle is subtracted from c when the loop is starting.

Note that toggling a pin's output portbit by setting its PIN bit high is not implemented in older AVR devices, but ATtiny13 and many others have that implemented.

Calculating the loop's delay

Another method to calculate loop delays is to list the instructions and to evaluate how often these are executed within the complete loop. The table demonstrates that.

Caclulating clock cycles The list's source line column lists all instructions that the loop consists of. The next column describes how often the instruction is executed within the loops, depending from the constant c. For all conditional branches in the loop two numbers have to be considered: the number of cases where jumps occur (those need two clock cycles each) and where no jumps occur (consuming only one clock cycle). The single clock cycle instructions are written to the next column, the two-cycle instructions are listed in the following column, already multiplied by two. The third column adds the two previous columns and sorts the formula's constituents by their type.

Note that the division by 256 and by 65,536 are in integer math: only the full number as integer is used, the fractions are deleted and neither considered nor used to round up.

The complete formula for the clock cycles can easily be derived then by summing up the different constituents.

Calculating c for a certain number of clock cycles

The formula to calculate c for a number of clock cycles CC is rather simple:
c = (CC - 7) / (3 + 2 / 256 + 2 / 65536) = (CC - 7) / 3.00784301757812
or roughly divided by 3.

If you want to calculate this exactly in an assembler source file, you'll have to ensure that the small fraction at the end is correct. So multiply both, the divident and the divisor, with a large enough number - e.g. 0x100000000 - first, before dividing 2 by 256 and 2 by 65536. The exact formulation then is:

.equ cM=0x100000000
.equ cCalc =(cM*(cc-7))/(3*cM+2*cM/256+2*cM/65536)

That ensures that the derived number for c is absolutely correct.

64 bit loop

For extremely long times, such as hours, days and years more than only 24 bits are necessary. The same algorithm works here, but with 64 bits delays of up to 730,000 years can be handled.

The counting loop with 64 bits looks like that:

Restart:
  ldi rCnt0,Byte1(cCnt) ; +1 = 1
  ldi rCnt1,Byte2(cCnt) ; +1 = 2
  ldi rCnt2,Byte3(cCnt) ; +1 = 3
  ldi rCnt3,Byte4(cCnt) ; +1 = 4
  ldi rCnt4,Byte1(cCnt/65536/65536) ; +1 = 5
  ldi rCnt5,Byte2(cCnt/65536/65536) ; +1 = 6
  ldi rCnt6,Byte3(cCnt/65536/65536) ; +1 = 7
  ldi rCnt7,Byte4(cCnt/65536/65536) ; +1 = 8
Count:
  subi rCnt0,1 ; Downcount rCnt0
  brcc Count ; First inner loop
  subi rCnt1,1 ; Downcount rCnt1
  brcc Count ; First outer loop
  subi rCnt2,1 ; Downcount rCnt2
  brcc Count ; Second outer loop
  subi rCnt3,1 ; Downcount rCnt3
  brcc Count ; Third outer loop
  subi rCnt4,1 ; Downcount rCnt4
  brcc Count ; Fourth outer loop
  subi rCnt5,1 ; Downcount rCnt5
  brcc Count ; Fifth outer loop
  subi rCnt6,1 ; Downcount rCnt6
  brcc Count ; Sixth outer loop
  subi rCnt7,1 ; Downcount rCnt7
  brcc Count ; Seventh outer loop
  sbi pIn,bIn ; Ignite, +2 = 10
  rjmp Restart ; Restart, +2 = 12

The calculation is also relative simple. The inner loop in the loop section is executed cCnt times plus one. Each execution consumes three clock cyles (one for SUBI, two for the BRCC). The last execution needs only two clock cycles because the jump back is not executed. The BRCC of the inner loop is therefore executed The two "+1" are caused by the fact that all loops are executed at least once and that the counter count down to 0xFFFF.FFFF.FFFF.FFFF and not to zero.

The next loop is executed "(c / 256) + 1" times, the following loops each 256 times less. This yields the following row:

LoopExecutionsAbbreviation
1cCnt + 1c + 1
2cCnt / 256 + 1c8
3cCnt / 65,536 + 1c16
4cCnt / 16,777,216 + 1c24
5cCnt / 4,294,967,296 + 1c32
6cCnt / 1,099,511,627,776 + 1c40
7cCnt / 281,474,976,710,656 + 1c48
8cCnt / 72,057,594,037,927,936 + 1c56
LastcCnt / 18,446,744,072,719,551,616 + 1c64


Please note that the divisions are in integer mode with decimal fraction ignored (rounded down).

This yields the following rows of clock cycles.

Code lineNumber of executions withTotal clocks
one clock cycletwo clock cycles
; Loading--8
subi rCnt0,1c+1-c + 1
brcc Countc8c-c8c8 + 2*c - 2*c8
subi rCnt1,1c8-c8
brcc Countc16c8-c16c16 + 2*c8 - 2*c16
subi rCnt2,1c16-c16
brcc Countc24c16-c24c24 + 2*c16 - 2*c24
subi rCnt3,1c24-c24
brcc Countc32c24-c32c32 + 2*c24 - 2*c32
subi rCnt4,1c32-c32
brcc Countc40c32-c40c40+ 2*c32 - 2*c40
subi rCnt5,1c40-c40
brcc Countc48c40-c48c48 + 2*c40 - 2*c48
subi rCnt6,1c48-c48
brcc Countc56c48-c56c56 + 2*c48 - 2*c56
subi rCnt7,1c56-c56
brcc Countc64c56-c64c64 + 2*c56 - 2*c64
sbi pIn,bIn-12
rjmp Restart-12


If all instruction cycles are added together, the following formula describes the total clock cycles:
CC = 3*c + 2*c8 + 2*c16 + 2*c24 + 2*c32 + 2*c40 + 2*c48 + 2*c56 - c64 + 13
The c is approximately CC / 3 (with an error smaller than 1%), but it can be calculated exactly with this formula.

Because the conversion of e.g. years in clock cycles is not that simple, I added the following lines to the code:

; **********************************
;   A D J U S T A B L E   C O N S T
; **********************************
;
; Compose the duration of counting
.equ cCntYears = 0
.equ cCntMonthes = 0
.equ cCntDays = 0
.equ cCntHours = 0
.equ cCntMinutes = 0
.equ cCntSeconds = 0
.equ cCntMilliseconds = 100
.equ cCntMicroseconds = 0
;
; The clock frequency
.equ Clock = 1200000 ; of the ATtiny13
;
; **********************************
;  F I X  &  D E R I V.  C O N S T
; **********************************
;
.equ cCntSec = cCntSeconds+60*cCntMinutes+3600*cCntHours+86400*cCntDays+2629800*cCntMonthes+31557600*cCntYears
.equ cCntUSec = 1000*cCntMilliseconds+cCntMicroSeconds
.equ cCnt = (cCntSec * Clock + Clock * cCntUSec / 1000000 - 70) / 3

Editing times is comfortable with this, the assembler does all the conversion work. If you want to have the first pulse one hour after the operation voltage has been applied, just set cCntHours to one.

The code of the 64-bit-looping is here.

You see that optimization in assembler provides many opportunities. Here we just replaced a DEC with a SUBI instruction and a BRNE by a BRCC instruction, and we got a short and simple piece of code. And: easy to understand and to calculate.

But: please note that the calculation of very long time periods can exceed the limits of assemblers that only work with 32 bit long integers. Assembling then ends with an overflow message. gavrasm and avr_sim work with INT64 and you can handle 1000s of years long time loops without any problems.

Modified 64-bit delay

The rather complicated calculation of the delay time can be simplified by adding an NOP at the end of each counting loop, to bring all 256 counter loop executions to the same clock cycle lengthes. Such a counter loop now looks like this:

  ldi rCnt,LoopRepetitions
Counterloop:
  subi rCnt,1
  brcc Counterloop
  nop

Each loop execution now needs exactly 3 clock cycles. If you combine eight of such loops, each addional loop executions needs three clock cycles. The number of clock cycles is then:
CC = 3 * cCnt / 2560 + 3 * cCnt / 2561 + 3 * cCnt / 2562 + 3 * cCnt / 2563 + ... + 3 * cCnt / 2567 + 36
Each cCnt/256N stands for another execution of the byte loop N. The constant 36 stands for the loading time, the execution of all the loops in the last cycle and for switching and for the jumping back.

This can be coded in assembler much simpler than in the upper case. The sourec code here does this and demonstrates such a counting loop. Additionally this source code allows the calculation of very long times (> 10 years) to be compatible with a 64-bit integer handling. But: calculation of 2567 still fails, so that the algorithm produces a small error margin in times longer than 1,000 years, which introduces additional problems with battery sustainability either way.

To the top of that page

©2009-2020 by http://www.avr-asm-tutorial.net