Lecture notes - Data Representation

Prerequisite Material from 252 (Starts Here)

Character Representation

Everything represented by a computer is represented by binary sequences.

A common non-integer to be represented is a character. We use standard encodings (binary sequences) to repreesent characters.

REMEMBER: bit patterns do NOT imply a representation

A 4-bit binary quantity is called a nibble. An 8-bit binary quantity is called a byte.

Many I/O devices work with 8-bit quantities. A standard code ASCII (American Standard for Computer Information Interchange) defines what character is represented by each sequence. You'll look these up in an ASCII table.

examples:
0100 0001 is 41 (hex) or 65 (decimal). It represents 'A'
0100 0010 is 42 (hex) or 66 (decimal). It represents 'B'

Different bit patterns are used for each different character that needs to be represented.

The code has some nice properties. If the bit patterns are compared, (pretending they represent integers), then 'A' < 'B'. This is good, because it helps with sorting things into alphabetical order.

Notes:

'a' (0x61) is different than 'A' (0x41)
'8' (0x38) is different than the integer 8
two's complement repepresentation for 8: 0000 0000 0000 0000 0000 0000 0000 1000
the ASCII character '8': 0011 1000
the digits:
'0' is 48 (decimal) or 0x30
'9' is 57 (decimal) or 0x39

Because of this difference between the integer representation for a character, and the character representation for a character, we constantly need to convert from one to the other.

The computer does arithmetic operations on two's complement integers (and often operations on unsigned integers). The computer has the ability to read in or print out a single character representation at a time. So, any time we want to do I/O, we're working with one character at a time, and the ASCII representation of the character. Yet, lots of the time, the data represents numbers (just consider integers, for now).

Characters that represent an integer

To read in an integer, and then process the integer, consider an example. Suppose the user types the 4-character sequence 123\n.

The computer can read in a single character at a time. If it reads in exactly 4 characters, then the representations that the computer will have will be


ASCII               decimal  binary      integer     8-bit two's comp. 
character    hex    value                 desired     representation

'1'         0x31    49       00110001      1           00000001
'2'         0x32    50       00110010      2           00000010
'3'         0x33    51       00110011      3           00000011
'\n'        0x0a    10       00001010      (NA)        (NA)

From this example, it should be easy to see that conversion of a single ASCII character representation to the desired two's complement integer representation does an integer subtraction.

integer rep desired = ASCII representation - 48

What we need is an algorithm for translating multi-character strings to the integers they represent, and visa versa.

ALGORITHM:   character string --> integer
   the steps:

      for '3' '5' '4'
      integer = 0

      read '3'
      translate '3' to 3
      integer =  integer * 10  + 3 = 3

      read '5'
      translate '5' to 5
      integer =  integer * 10  + 5 = 35

      read '4'
      translate '4' to 4
      integer =  integer * 10  + 4 = 354

the algorithm:

     integer = 0
     while there are more characters
       get character
       digit <-  character - 48
       integer <- integer * 10  + digit

Going the other direction for translation (integer to set of characters represented, printed out in the correct order), we partially reverse the algorithm.

ALGORITHM:  integer --> character string
   the steps:
   For 354, figure out how many characters there are in
   the base desired (3).

   Figure out base^(number of characters - 1)  (10^2)=100

     354 div 100 gives 3
     translate 3 to '3' and print it out
     354 % 100 gives 54
     100/10 = 10

     54 div 10 gives 5
     translate 5 to '5' and print it out
     54 mod 10 gives 4
     10/10 = 1

     4 div 1 gives 4
     translate 4 to '4' and print it out
     4 mod 1 gives 0
     1/10 = 0, so you're done

written in a form using two loops:

     # figure out base^(number of characters - 1)
     power_of_base = base
     while power_of_base is not large enough
         power_of_base = power_of_base * base

     while power_of_base != 0
         digit = integer / power_of_base
	 char_to_print = digit + 48
	 print char_to_print
	 integer = integer % power_of_base    # remainder after integer division
	 power_of_base = power_of_base / base # quotient

Prerequisite Material from 252 (Starts Here)

Character Representation

Characters that represent an integer

Prerequisite Material from 252 (Ends Here)