Path: Home => AVR-EN => Assembler introduction => Floats    (Diese Seite in Deutsch: Flag DE) Logo

Beginner's introduction to AVR assembler language

Floating point numbers in assembler language

Floating points, if necessary

Those who want to make life more complicated than necessary: besides whole numbers (integers), signed integers and fixed dot numbers floating point numbers are available. What comes in higher-level languages simply as 1.234567 is rather complicated in assembler. And that comes as follows.

The format of floating point numbers

Binary floating points consist of two constituets:
  1. a mantissa, and
  2. an exponent.
In the decimal world the mantissa gives the normal number part, in 1.234567 this is the 1.234567. The precision of the number is given by the seven digits. The exponent says how often the mantissa has to be multiplied by 10 (the base in the decimal world). In our example this would be a zero. The number could also be written as 0.1234567*101 or shorter as 0.1234567E+01, which says: shift the mantissa one time left. It could also be written as 12.34567E-01 to say: shift the mantissa one time right. The formulation 1.234567E+00 is called normalized in that it Numbers larger than 9.999999 are repeatedly divided by 10, by that increasing the exponent. Numbers smaller than 1 are repeatedly multiplied by 10, by that decreasing the exponent. Numbers smaller than one have a negative exponent. That is why the exponent has to be a signed integer.

Numbers themselves also can be negative, such as -1.234567. Because multiplying and dividing does not change the sign, the mantissa also needs a sign bit. So we can handle positive as well as negative numbers, such as -182.162°C as the boiling point of oxygen. Of course we'll have to divide this boiling point by 1,00 to get a normalized mantissa, and its exponent will be plus two. Normalized we'll get -1.82162E+02 for that boiling point.

16 bit float number Converted to the binary world, where the base is 2, the floating point numbers need at least two bytes: one for the mantissa and one for the exponent. Both are signed integers. The meaning of one bit in the mantissa and one bit in the exponent is very different:
  1. In the mantissa each bit, starting from the dot, or better: from its highest non-sign-bit, stands for 1 divided by 2, powered by n, where n is its position in the mantissa. So the first bit is 1 / 2^1 = 1 / 2, or in decimal 0.5. Each further bit stands for half of the previous bit, so the next in the line is 0.25, the overnext is 0,125 etc. etc.
  2. The exponent is simpler to understand: in an 8-bit exponent it reaches from zero to 127 (hexadecimal 0x00 to 0x7F) for positive exponents and from -1 to -128 (hexadecimal 0xFF for -1, 0x80 for -128) for negative exponents. This says that for each positive number the mantissa has to be shifted n positions to the left, for negative ones shifted one position to the right. A left shift means multiplying the mantissa by two, right shift a division by two.
Because the exponent shifts the number by its power of two (* 2 ^), each bit of it is more powerful than a bit in the mantissa. So 2^127 is 1.7-multiplied by-10-power-38 or 1.7*1038 or even shorter 1.7E38. Vice versa, negative exponents make the exponent part of the number very small: 2^-128 is decimal 2.9E-39. With eight bit exponent only we can cover the range of numbers between 2.9E-39 to 1.7E+38. That should be enough large or small, not for an astronomer but for most of the rest of the calculating mankind. So an 8-bit exponent is sufficient.

Very small are the variations that come with the mantissa: as can be seen from an 8-bit mantissa's 0x7F that its decimal value is only 0.992 and by only 0.008 below the one. So we can one handle numbers with slightly more than two digit (2 1/2) precision in an 8-bit mantissa. By far not enough for calculating interest rates or other commercial stuff or in engineering, only suitable for rather rough technical measurements. 8-bit mantissa's are of the same accuracy as an ancient slide rule (for those who are still familiar with that kind of calculating machines).

24 bit float number To increase the precision we add additional eight bits to the mantissa. The lowest of the mantissa's bits stands now for 0.0000305. This increases the precision to slightly more than four digits. If we would add another byte to the mantissa we are at slightly more than six decimal digits, the complete number has already 32 bits or four bytes. 16-bit mantissas are not precise enough to calculate Mandelbrot-sets, but are suitable for most technical applications.

Resolutions of floats with different bit length and combinations If you need higher resolutions, pick a needed style from this table.

Because one additional mantissa bit can increase precision by roughly a half decimal digit, the inventors of binary floats increased it by one with a trick: because a normalized binary mantissa always starts with a one, this bit can be skipped and an additional bit fits into the 16 bit mantissa at the end. These kind of tricks increase the variability of floating number formats and make it more and more complicated to understand: of course the skipped one-bit on top has to be added when calculating with the mantissa. It can replace the mantissa's sign bit, if that bit sign bit is stored elsewhere.

An advantage do those floats have: they simplify the multiplication and division of two floats. If we have to multiply two floats with their mantissas M1 and M2, we can simply multiply the two mantissas and, even more simple, add their two exponents E1 and E2. When dividing, we have to subtract E2 from E1.

The simplification when multiplying is associated by a higher effort when adding or subtracting. Before we can add the two mantissas we have to bring their exponents to the same value (by shifting the mantissa of the smaller number to the right). Only when both are equal, we can add both mantissas.

Conversion of binary to decimal number format

To demonstrate that handling binary float numbers is rather extensive, I have shown the conversion of a 24-bit float with a 16-bit mantissa in detail. The software for doing that has 410 code lines and needs a few milli-seconds in an AVR. How this is done is documented on this page here. If you want to learn assembler: this is a more high-level example, with lots of pointers. I hope that you enjoy the understanding of a more complex task.


Those who are clever and do not need numbers up to 1038 (or even larger) avoid floats and rather use integers or fixed floating point numbers (Pseudo-floats). Those are by far simpler to handle, easier to understand and it is rather simpler to adjust their precision to the given practical needs.

To the page top

©2021 by