Computers represent real values in a form similar to that of scientific notation. Consider the value
1.23 x 10^4
The number has a sign (+ in this case)
The significand (1.23) is written with one non-zero digit
to the left of the decimal point.
The base (radix) is 10.
The exponent (an integer value) is 4. It too must have a sign.
There are standards which define what the representation means, so that across computers there will be consistancy.
Note that this is not the only way to represent floating point numbers, it is just the IEEE standard way of doing it.
Here is what we do:
the representation has three fields:
---------------------------- | S | E | F | ----------------------------
the decimal value represented is:
S e (-1) x f x 2where
e = E - bias f = ( F/(2^n) ) + 1
for single precision representation (the emphasis in this class)
n = 23
bias = 127
for double precision representation (a 64-bit representation)
n = 52 (there are 52 bits for the mantissa field)
bias = 1023 (there are 11 bits for the exponent field)
Since floating point representations use biased integer representations for the exponent field, here is a brief discussion of biased integers.
An integer representation that skews the bit patterns so as to look just like unsigned but actually represent negative numbers.
It represents a range of values (different from unsigned representation) using the unsigned representation. Another way of saying this: biased representation is a re-mapping of the unsigned integers.
visual example (of the re-mapping):
bit pattern: 000 001 010 011 100 101 110 111 unsigned value: 0 1 2 3 4 5 6 7 biased-2 value: -2 -1 0 1 2 3 4 5This is biased-2. Note the dash character in the name of this representation. It is not a negative sign.
Example:
Given 4 bits, bias values by 2**3 = 8
(This choice of bias results in approximately half the
represented values being negative.)
TRUE VALUE to be represented 3 add in the bias +8 ---- unsigned value 11 so the 4-bit, biased-8 representation of the value 3 will be 1011
Example:
Going the other way, suppose we were given a 4-bit, biased-8 representation of 0110 unsigned 0110 represents 6 subtract out the bias - 8 ---- TRUE VALUE represented -2
On choosing a bias:
The bias chosen is most often based on the number of bits
available for representing an integer. To get an approx.
equal distribution of values above and below 0,
the bias should be
2 ^ (n-1) or (2^(n-1)) - 1
Now, what does all this mean?
An example: Put the decimal number 64.2 into the IEEE standard single precision floating point representation.
first step: get a binary representation for 64.2 to do this, get unsigned binary representations for the stuff to the left and right of the decimal point separately. 64 is 1000000 .2 can be gotten using the algorithm: .2 x 2 = 0.4 0 .4 x 2 = 0.8 0 .8 x 2 = 1.6 1 .6 x 2 = 1.2 1 .2 x 2 = 0.4 0 now this whole pattern (0011) repeats. .4 x 2 = 0.8 0 .8 x 2 = 1.6 1 .6 x 2 = 1.2 1 so a binary representation for .2 is .001100110011. . . ---- or .0011 (The bar over the top shows which bits repeat.) Putting the halves back together again: 64.2 is 1000000.0011001100110011. . . second step: Normalize the binary representation. (make it look like scientific notation) 6 1.000000 00110011. . . x 2 third step: 6 is the true exponent. For the standard form, it needs to be in 8-bit, biased-127 representation. 6 + 127 ----- 133 133 in 8-bit, unsigned representation is 1000 0101 This is the bit pattern used for E in the standard form. fourth step: the mantissa stored (F) is the stuff to the right of the radix point in the normalized form. We need 23 bits of it. 000000 00110011001100110 put it all together (and include the correct sign bit): S E F 0 10000101 00000000110011001100110 the values are often given in hex, so here it is 0100 0010 1000 0000 0110 0110 0110 0110 0x 4 2 8 0 6 6 6 6
Some extra details:
We take the bit patterns 0x0000 0000 and 0x8000 0000 to represent the value 0.
(What floating point numbers cannot be represented because of this?)
Note that the hardware that does arithmetic on floating point numbers must be constantly checking to see if it needs to use a hidden bit of a 1 or a hidden bit of 0 (for 0.0).
Values that are very close to 0.0, and would require the hidden bit to be a zero are called denormalized or subnormal numbers.
S E F 0.0 0 or 1 00000000 00000000000000000000000 (hidden bit is a 0) subnormal 0 or 1 00000000 not all zeros (hidden bit is a 0) normalized 0 or 1 > 0 any bit pattern (hidden bit is a 1)
S E F +infinity 0 11111111 00000... (0x7f80 0000) -infinity 1 11111111 00000... (0xff80 0000) NaN (Not a Number) ? 11111111 ?????... (S is either 0 or 1, E=0xff, and F is anything but all zeros)
For double precision:
Copyright © Karen Miller, 2006 |