Floating Point Representation

Computers represent real values in a form similar to that of scientific notation. Consider the value

1.23 x 10^4

The number has a sign (+ in this case)
The significand (1.23) is written with one non-zero digit to the left of the decimal point.
The base (radix) is 10.
The exponent (an integer value) is 4. It too must have a sign.

There are standards which define what the representation means, so that across computers there will be consistancy.

Note that this is not the only way to represent floating point numbers, it is just the IEEE standard way of doing it.

Here is what we do:

the representation has three fields:

     ----------------------------
     | S |   E     |     F      |
     ----------------------------

S is one bit representing the sign of the number
E is an 8-bit biased integer representing the exponent
F is an unsigned integer

the decimal value represented is:

              S        e
	  (-1)  x f x 2
where
	    e = E - bias

	    f = ( F/(2^n) ) + 1

for single precision representation (the emphasis in this class)
n = 23
bias = 127

for double precision representation (a 64-bit representation)
n = 52 (there are 52 bits for the mantissa field)
bias = 1023 (there are 11 bits for the exponent field)


Biased Integer Representation

Since floating point representations use biased integer representations for the exponent field, here is a brief discussion of biased integers.

An integer representation that skews the bit patterns so as to look just like unsigned but actually represent negative numbers.

It represents a range of values (different from unsigned representation) using the unsigned representation. Another way of saying this: biased representation is a re-mapping of the unsigned integers.

visual example (of the re-mapping):

        bit pattern:        000  001  010  011  100  101  110  111

        unsigned value:      0    1    2    3    4    5    6    7

        biased-2 value:      -2   -1   0    1    2    3    4    5
This is biased-2. Note the dash character in the name of this representation. It is not a negative sign.

Example:

Given 4 bits, bias values by 2**3 = 8
(This choice of bias results in approximately half the represented values being negative.)

	  TRUE VALUE to be represented      3
	  add in the bias                  +8
					 ----
	  unsigned value                   11

	  so the 4-bit, biased-8 representation of the value 3
	  will be  1011

Example:

	  Going the other way, suppose we were given a
	  4-bit, biased-8 representation of   0110

	  unsigned 0110  represents 6
	  subtract out the bias   - 8
				  ----
	  TRUE VALUE represented   -2

On choosing a bias:
The bias chosen is most often based on the number of bits available for representing an integer. To get an approx. equal distribution of values above and below 0, the bias should be

      2 ^ (n-1)      or   (2^(n-1)) - 1

Now, what does all this mean?

An example: Put the decimal number 64.2 into the IEEE standard single precision floating point representation.

	first step:
	  get a binary representation for 64.2
	  to do this, get unsigned binary representations for the stuff to the left
	  and right of the decimal point separately.

	  64  is   1000000

	  .2 can be gotten using the algorithm:

	  .2 x 2 =  0.4      0
	  .4 x 2 =  0.8      0
	  .8 x 2 =  1.6      1
	  .6 x 2 =  1.2      1

	  .2 x 2 =  0.4      0  now this whole pattern (0011) repeats.
	  .4 x 2 =  0.8      0
	  .8 x 2 =  1.6      1
	  .6 x 2 =  1.2      1
	    

	    so a binary representation for .2  is    .001100110011. . .

                 ----
	    or  .0011  (The bar over the top shows which bits repeat.)


	Putting the halves back together again:
	   64.2  is     1000000.0011001100110011. . .


      second step:
	Normalize the binary representation. (make it look like
	scientific notation)

				  6
	1.000000 00110011. . . x 2

      third step:
	6 is the true exponent.  For the standard form, it needs to
	be in 8-bit, biased-127 representation.

	      6
	  + 127
	  -----
	    133

	133 in 8-bit, unsigned representation is 1000 0101

	This is the bit pattern used for E in the standard form.

      fourth step:
	the mantissa stored (F) is the stuff to the right of the radix point
	in the normalized form.  We need 23 bits of it.

	  000000 00110011001100110


      put it all together (and include the correct sign bit):

	 S     E               F
	 0  10000101  00000000110011001100110

      the values are often given in hex, so here it is

	 0100 0010 1000 0000 0110 0110 0110 0110

     0x   4    2    8    0    6    6    6    6

Some extra details:

Copyright © Karen Miller, 2006