Everything represented by a computer is represented by binary sequences.
A common non-integer to be represented is a character. We use standard encodings (binary sequences) to repreesent characters.
REMEMBER: bit patterns do NOT imply a representation
A 4-bit binary quantity is called a nibble. An 8-bit binary quantity is called a byte.
Many I/O devices work with 8-bit quantities. A standard code ASCII (American Standard for Computer Information Interchange) defines what character is represented by each sequence. You'll look these up in an ASCII table.
examples:
0100 0001 is 41 (hex) or 65 (decimal). It represents 'A'
0100 0010 is 42 (hex) or 66 (decimal). It represents 'B'
Different bit patterns are used for each different character that needs to be represented.
The code has some nice properties. If the bit patterns are compared, (pretending they represent integers), then 'A' < 'B'. This is good, because it helps with sorting things into alphabetical order.
Notes:
0000 0000 0000 0000 0000 0000 0000 1000
0011 1000
Because of this difference between the integer representation for a character, and the character representation for a character, we constantly need to convert from one to the other.
The computer does arithmetic operations on two's complement integers (and often operations on unsigned integers). The computer has the ability to read in or print out a single character representation at a time. So, any time we want to do I/O, we're working with one character at a time, and the ASCII representation of the character. Yet, lots of the time, the data represents numbers (just consider integers, for now).
To read in an integer, and then process the integer, consider an example.
Suppose the user types the 4-character sequence 123\n
.
The computer can read in a single character at a time. If it reads in exactly 4 characters, then the representations that the computer will have will be
ASCII decimal binary integer 8-bit two's comp.
character hex value desired representation
'1' 0x31 49 00110001 1 00000001
'2' 0x32 50 00110010 2 00000010
'3' 0x33 51 00110011 3 00000011
'\n' 0x0a 10 00001010 (NA) (NA)
From this example, it should be easy to see that conversion of a single ASCII character representation to the desired two's complement integer representation does an integer subtraction.
integer rep desired = ASCII representation - 48
What we need is an algorithm for translating multi-character strings to the integers they represent, and visa versa.
ALGORITHM: character string --> integer the steps: for '3' '5' '4' integer = 0 read '3' translate '3' to 3 integer = integer * 10 + 3 = 3 read '5' translate '5' to 5 integer = integer * 10 + 5 = 35 read '4' translate '4' to 4 integer = integer * 10 + 4 = 354
the algorithm:
integer = 0 while there are more characters get character digit <- character - 48 integer <- integer * 10 + digit
Going the other direction for translation (integer to set of characters represented, printed out in the correct order), we partially reverse the algorithm.
ALGORITHM: integer --> character string the steps: For 354, figure out how many characters there are in the base desired (3). Figure out base^(number of characters - 1) (10^2)=100 354 div 100 gives 3 translate 3 to '3' and print it out 354 % 100 gives 54 100/10 = 10 54 div 10 gives 5 translate 5 to '5' and print it out 54 mod 10 gives 4 10/10 = 1 4 div 1 gives 4 translate 4 to '4' and print it out 4 mod 1 gives 0 1/10 = 0, so you're done
written in a form using two loops:
# figure out base^(number of characters - 1) power_of_base = base while power_of_base is not large enough power_of_base = power_of_base * base while power_of_base != 0 digit = integer / power_of_base char_to_print = digit + 48 print char_to_print integer = integer % power_of_base # remainder after integer division power_of_base = power_of_base / base # quotient
Copyright © Karen Miller, 2006 |