Prerequisite Material from 252 (Starts Here)
Motivation for Registers
REGISTERS and MAL
-----------------
An introduction to the subject of registers -- from a motivational
point of view.
This lecture is an attempt to explain a bit about why computers
are designed (currently) the way they are. Try to remember that
speed of program execution is an important goal. Desire for increased
speed drives the design of computer hardware.
The impediment to speed (currently): transferring data to and from
memory.
look at an invented instruction:
add x, y, z
-x, y, and z must all be addresses of data in memory.
-each address is 32 bits.
- what does the machine code look like?
----------------------------------------
| add | x | y | z |
----------------------------------------
----------------------------------------
| opcode| address | address | address |
----------------------------------------
8(?) 32 32 32
so, this instruction requires more than 96 bits.
IF each read from memory delivers 32 bits of data,
then it takes a lot of reads before this instruction can
be completed.
at least 3 for instruction fetch
1 to load y
1 to load z
1 to store x
that's 6 transactions with memory for 1 instruction!
How bad is the problem?
Assume that a 32-bit 2's complement addition takes 1 time unit.
A read/write from/to memory takes about 10 time units.
(Note that this is a very conservative estimate, the ratio is
more like 1:16.)
So we get
fetch instruction 30 time units
(and update PC)
decode 1 time unit
load y 10 time units
load z 10 time units
add 1 time unit
store x 10 time units
---------------------------------
total time: 62 time units
60/62 = 96.7 % of the processor's time is spent doing memory operations.
what do we do to reduce this number?
1. transfer more data at one time
if we transfer 64 bits at one time, then it only takes 2 reads
to get the instruction. There is no savings in loading/storing
the operands. And, an extra word worth of data is transferred
for each load, a waste of resources.
So, this idea would give a saving of 1 memory transaction.
With the invented example instruction:
64 bits 128 bits
fetch instruction: 20 10
decode 1 1
load y 10 10
load z 10 10
add 1 1
store x 10 10
--------------------------------- -----
total time: 52 42
2. shorten addresses. This restricts where variables can be placed.
First, make each address be 16 bits (instead of 32). Then
add x, y, z
requires 2 words for instruction fetch.
Shorten addresses even more . . . make them each 5 bits long.
Problem: that leaves only 32 words of data for operand storage.
So, use extra move instructions that allow moving data from
a 32-bit address to one of these special 32 words.
Then, the add can fit into 1 transferred word.
With the invented example instruction:
32 bits transferred 32 bits transferred
16-bit addr 5-bit addr
fetch instruction: 20 10
decode 1 1
load y 10 10
load z 10 10
add 1 1
store x 10 10
--------------------------------- -----
total time: 52 42
3. modify the instruction set such that instructions are smaller.
This was common on machines from more than a decade ago.
(It is still part of the IA-32 architecture from Intel)
Here's how it works:
The invented instruction implies what is called a 3-address machine.
Each arithmetic type instruction contains 3 operands, 2 for sources
and 1 for the destination of the result.
To reduce the number of operands (and thereby reduce the number
of reads for the instruction fetch), develop an instruction set
that uses 2 operands for arithemtic type instructions.
(Called a 2-address machine.)
Now, instead of add x, y, z
we will have move x, z (copies the value of z into x)
add x, y ( x <- x + y )
so, arithmetic type instructions always use one of the operands
as both a source and a destination.
There's a couple of problems with this approach:
- where 1 instruction was executed before, 2 are now executed.
It actually takes more memory transactions to execute this sequence!
at least 2 to fetch each instruction
1 for each of the load/storing of the operands themselves.
that is 8 reads/writes for the same sequence.
32 bits 64 bits
move add move add
fetch instruction: 20 20 10 10
decode 1 1 1 1
load operand 10 10 10 10
operation 0 1 0 1
store 10 10 10 10
--------------------------------- -----------
sum: 41 42 31 32
total: 83 63
(Is this better than for the 3-address machine?)
So, allow only 1 operand -- called a 1-address format.
now, the instruction add x, y, z will be accomplished
by something like
load z
add y
store x
to facilitate this, there is an implied word of storage
associated with the ALU. All results of instructions
are placed into this word -- called an ACCUMULATOR.
the operation of the sequence:
load z -- read from memory at address z, and place value into
the accumulator
add y -- implied operation is to add the contents of the
accumulator with the operand, and place the result
back into the accumulator.
store x-- write to memory at address x; the value is the contents
of the accumulator
Notice that this 1-address instruction format implies the use
of a variable (the accumulator).
How many memory transactions does it take?
2 -- (load) at least 1 for instruction fetch, 1 for read of z
2 -- (add) at least 1 for instruction fetch, 1 for read of y
2 -- (store) at least 1 for instruction fetch, 1 for write of x
---
6 the same as for the 3-address machine -- no savings.
32 bits transferred
load add store
fetch instruction: 10 10 10
decode 1 1 1
load operand 10 10 0
operation 0 1 0
store 0 0 10
---------------------------------
sum: 21 22 21
total: 64
BUT, what if we wanted
x = (y + z) / 3
For the 3-address machine, the operation following the add is
div x, x, 3
3-address machine 32 bits
add div
fetch instruction 30 30
decode 1 1
load one operand 10 10
load other operand 10 0 (immediate is in instruction)
add 1 1
store x 10 10
---------------------------------
sum: 62 52
total: 114
For the 1-address machine, the value for x is already in the
accumulator, and the code on the 1-address machine could be
load z
add y
div 3
store x
there is only 1 extra instruction (2 memory transactions) for this
whole sequence!
1-address machine 32 bits
load add div store
fetch instruction 10 10 10 10
decode 1 1 1 1
load operand 10 10 0 0
operation 0 1 1 0
store 0 0 0 10
-------------------------------------------
sum: 21 22 12 21
total: 76
REMEMBER this: the 1-address machine uses an extra word of storage
that is located in the CPU (processor).
the example shows a savings in memory transactions
when a value is re-used.
NOW, put a couple of these ideas together.
Use of storage in processor (accumulator) allowed re-use of data.
It is easy to design -- put a bunch of storage in the processor --
call them REGISTERS. How about 32 of them? Then, restrict
arithmetic instructions to only use registers as operands.
add x, y, z
becomes something more like
load reg10, y
load reg11, z
add reg12, reg11, reg10
store x, reg12
presuming that the values for x, y, and z can/will be used again,
the load operations take relatively less time.
The MIPS R2000 architecture does this. It has
1. 32 32-bit registers.
2. Arithmetic/logical instructions use register values as operands.
A set up like this where arith/logical instr. use only registers
for operands is called a LOAD/STORE architecture. The only way to
access data within memory is to use an explicit instruction (load)
to read the data from memory and copy it into a register.
A computer that allows operands to come from main memory is often
called a MEMORY TO MEMORY architecture, although that term is not
universal.
Load/store architectures are common today. They have the advantages
1. instructions can be fixed length (and short)
2. their design allows (easily permits) pipelining, making load/store
architectures faster
Prerequisite Material from 252 (Ends Here)
Copyright © Karen Miller, 2006
|