Finally, for extra credit, a specification is provided for a mechanism to
generate an interrupt after executing some count of instructions. The Instruction
Issue Counter, IIC, is a 16-bit register which is decremented for each instruction
issued and stops counting when it reaches zero. When the register is decremented
from one to zero, an
interrupt is generated, and the PC is saved in a register called EPC. To
return from the interrupt, there is an RTI instruction which loads PC from EPC.
Instruction formats
WISC-SP06 supports instructions in four different formats: J-format, 2 I-formats, and the R-format. These are described below.
5 bits | 11 bits |
Op Code | Displacement |
The Jump-And-Link instruction loads the PC with the same value and also saves the address of the next sequential instruction (i.e., PC+1) in the link register R7.
The syntax of the jump instructions is:
5 bits | 3 bits | 3 bits | 5 bits |
Op Code | Rs | Rd | Immediate |
The I-format 1 instructions include XOR-Immediate, ANDN-Immediate, Add-Immediate, Subtract-Immediate, Rotate-Left-Immediate, Shift-Left-Logical-Immediate, Shift-Right-Arithmetic-Immediate, Shift-Right-Logical-Immediate, Load, Store, and Store with Update.
The ANDNI instruction loads register Rd with the value of the register Rs AND-ed with the one's complement of the zero-extended immediate value. (It may be thought of as a bit-clear instruction.) ADDI loads register Rd with the sum of the value of the register Rs plus the sign-extended immediate value. SUBI loads register Rd with the result of subtracting register Rs from the sign-extended immediate value. (That is, immed - Rs, not Rs - immed.) Similar instructions have similar semantics, i.e. the logical instructions have zero-extended values and the arithmetic instructions have sign-extended values.
For Load and Store instructions, the effective address of the operand to be read or written is calculated by adding the value in register Rs with the sign-extended immediate value. The value is loaded to or stored from register Rd. The STU instruction, Store with Update, acts like Store but also writes Rs with the effective address.
The syntax of the I-format 1 instructions is:
5 bits | 3 bits | 8 bits |
Op Code | Rs | Immediate |
The Load Byte Immediate instruction loads Rs with a sign-extended 8 bit immediate value.
The Shift-and-Load-Byte-Immediate instruction shifts Rs 8 bits to the left, and replaces the lower 8 bits with the immediate value.
The format of these instructions is:
The Jump-Register instruction loads the PC with the value of register Rs + signed immediate. The Jump-And-Link-Register instruction does the same and also saves the return address (i.e., the address of the JALR instruction plus one) in the link register R7. The format of these instructions is
The branch instructions test a general purpose register for some condition. The available conditions are: equal to zero, not equal to zero, less than zero, and greater than or equal to zero. If the condition holds, the signed immediate is added to the address of the next sequential instruction and loaded into the PC. The format of the branch instructions is
5 bits | 3 bits | 3 bits | 3 bits | 2 bits |
Op Code | Rs | Rt | Rd | Op Code Extension |
The ADD instruction performs signed addition. The SUB instruction subtracts Rs from Rt. (Not Rs - Rt.) The set instructions SEQ, SLT, SLE instructions compare the values in Rs and Rt and set the destination register Rd to 0x1 if the comparison is true, and 0x0 if the comparison is false. SLT checks for Rs less than Rt, and SLE checks for Rs less than or equal to Rt. (Rs and Rt are two's complement numbers.) The set instruction SCO will set Rd to 0x1 if Rs plus Rt would generate a carry-out from the most significant bit; otherwise it sets Rd to 0x0. The Bit-Reverse instruction, BTR, takes a single operand Rs and copies it to Rd, but with a left-right reversal of each bit; i.e. bit 0 goes to bit 15, bit 1 goes to bit 14, etc.
The syntax of the R-format ALU and shift instructions is:
Special Instructions
The HALT instruction halts the processor. The HALT instruction and all older instructions execute normally, but the instruction after the halt will never execute. The PC is left pointing to the instruction directly after the halt.
The No-operation instruction occupies a position in the pipeline, but does nothing.
The syntax of these instructions is:
Instruction Counter and Interrupt
These instructions are used with the extra-credit interrupt mechanism. These instructions should remain equivalent to NOP until the rest of the design has been completed and thoroughly tested.
SIIC sets the Instruction Issue Counter to the value specified in Rs. The 16-bit Instruction Issue Counter will then start decrementing with each subsequent instruction issued until it has decremented to zero. If it is loaded with zero, it will remain zero and will not generate any interrupt. The timing of the load is such that, if the IIC is loaded with a one, then exactly one instruction after the SIIC will issue prior to the interrupt being generated. If loaded with a two, exactly two instructions will issue, and so forth. When the interrupt is generated, the EPC register will be loaded with the address of the next sequential instruction to be executed, and PC will be loaded with the constant "1".
RTI returns from an interrupt by loading the PC from the value in the EPC register.
The syntax of these instructions is:
Instruction Format | Syntax | Semantics |
00000 xxxxxxxxxxx | HALT | Cease instruction issue |
00001 xxxxxxxxxxx | NOP | None |
01000 sss ddd iiiii | ADDI Rd, Rs, immediate | Rd <- Rs + I(sign ext.) |
01001 sss ddd iiiii | SUBI Rd, Rs, immediate | Rd <- I(sign ext.) - Rs |
01010 sss ddd iiiii | XORI Rd, Rs, immediate | Rd <- Rs XOR I(zero ext.) |
01011 sss ddd iiiii | ANDNI Rd, Rs, immediate | Rd <- Rs AND ~I(zero ext.) |
10100 sss ddd iiiii | ROLI Rd, Rs, immediate | Rd <- Rs <<(rotate) I(lowest 4 bits) |
10101 sss ddd iiiii | SLLI Rd, Rs, immediate | Rd <- Rs << I(lowest 4 bits) |
10110 sss ddd iiiii | SRAI Rd, Rs, immediate | Rd <- Rs >>(arithmetic) I(lowest 4 bits) |
10111 sss ddd iiiii | SRLI Rd, Rs, immediate | Rd <- Rs >> I(lowest 4 bits) |
10000 sss ddd iiiii | ST Rd, Rs, immediate | Mem[Rs + I(sign ext.)] <- Rd |
10001 sss ddd iiiii | LD Rd, Rs, immediate | Rd <- Mem[Rs + I(sign ext.)] |
10011 sss ddd iiiii | STU Rd, Rs, immediate | Mem[Rs + I(sign ext.)] <- Rd
Rs <- Rs + I(sign ext.) |
11001 sss xxx ddd xx | BTR Rd, Rs | Rd[bit i] <- Rs[bit 15-i] for i=0..15 |
11011 sss ttt ddd 00 | ADD Rd, Rs, Rt | Rd <- Rs + Rt |
11011 sss ttt ddd 01 | SUB Rd, Rs, Rt | Rd <- Rt - Rs |
11011 sss ttt ddd 10 | XOR Rd, Rs, Rt | Rd <- Rs XOR Rt |
11011 sss ttt ddd 11 | ANDN Rd, Rs, Rt | Rd <- Rs AND ~Rt |
11010 sss ttt ddd 00 | ROL Rd, Rs, Rt | Rd <- Rs <<(rotate) Rt (lowest 4 bits) |
11010 sss ttt ddd 01 | SLL Rd, Rs, Rt | Rd <- Rs << Rt (lowest 4 bits) |
11010 sss ttt ddd 10 | SRA Rd, Rs, Rt | Rd <- Rs >>(arithmetic) Rt (lowest 4 bits) |
11010 sss ttt ddd 11 | SRL Rd, Rs, Rt | Rd <- Rs >> Rt (lowest 4 bits) |
11100 sss ttt ddd xx | SEQ Rd, Rs, Rt | if (Rs == Rt) then Rd <- 1 else Rd <- 0 |
11101 sss ttt ddd xx | SLT Rd, Rs, Rt | if (Rs < Rt) then Rd <- 1 else Rd <- 0 |
11110 sss ttt ddd xx | SLE Rd, Rs, Rt | if (Rs <= Rt) then Rd <- 1 else Rd <- 0 |
11111 sss ttt ddd xx | SCO Rd, Rs, Rt | if (Rs + Rt) generates carry out
then Rd <- 1 else Rd <- 0 |
01100 sss iiiiiiii | BEQZ Rs, immediate | if (Rs == 0) then
PC <- PC + 1 + I(sign ext.) |
01101 sss iiiiiiii | BNEZ Rs, immediate | if (Rs != 0) then
PC <- PC + 1 + I(sign ext.) |
01110 sss iiiiiiii | BLTZ Rs, immediate | if (Rs < 0) then
PC <- PC + 1 + I(sign ext.) |
01111 sss iiiiiiii | BGEZ Rs, immediate | if (Rs >= 0) then
PC <- PC + 1 + I(sign ext.) |
11000 sss iiiiiiii | LBI Rs, immediate | Rs <- I(sign ext.) |
10010 sss iiiiiiii | SLBI Rs, immediate | Rs <- (Rs << 8) | I(zero ext.) |
00100 ddddddddddd | J displacement | PC <- PC + 1 + D(sign ext.) |
00101 sss iiiiiiii | JR Rs, immediate | PC <- Rs + I(sign ext.) |
00110 ddddddddddd | JAL displacement | R7 <- PC + 1 PC <- PC + 1 + D(sign ext.) |
00111 sss iiiiiiii | JALR Rs, immediate | R7 <- PC + 1 PC <- Rs + I(sign ext.) |
00010 sss xxxxxxxx | NOP / SIIC Rs | IIC <- Rs |
00011 xxxxxxxxxxx | NOP / RTI | PC <- EPC |
You should use the modules you designed in previous homeworks for this project. If there were errors in your modules, you need to fix them. An error caused in the results of the final project by an earlier error will be considered to be an error in the project. Do not rely on our having found all errors in earlier work. In addition to the correction of errors, you may need to make other modifications.
For the single-cycle design, use the single-cycle memory model that
is supplied here.
Since you will need to fetch instructions as well as
read or write data in the cycle, use two memories -- one for instruction
memory and one for data.
For the demo, you will be asked to run the test programs here.
After you have completed the single-cycle implementation, you will next implement a pipelined version of the architecture. The pipeline will have five stages:
Be sure that the non-pipelined version is functional before you try the pipelined design. While designing the non-pipelined version, make considerations that will allow for an easy conversion to the pipelined version.
At this step, you may continue to use the same one-cycle memories that you used in the non-pipelined design.
You do not need to demo this version; this is just a stepping-stone to the next versions. However, if you do not succeed in making a later version work, you will want to be able to at least demonstrate that you got this version working.
Your design will be graded on functionality first, and performance second. Thus, you should get your pipelined processor working before trying to optimize it. For example, your initial design will stall on all branch and control hazards. After you get the basic pipeline design to work, then add optimizations. Your goal is to reduce the CPI, or cycles per instruction. You will be graded in part on the number of cycles you take to execute the test programs. Increasing the clock rate is a secondary concern. However, you must adhere to the following rule: You may not have more than one of the following blocks in series in the same clock cycle:
The first optimization to implement is register forwarding. True data dependences are very common and compilers have only limited ability to schedule around them. Your register file (from the homework) already implements forwarding within the Decode cycle; additional forwarding to add is from the beginning of the M stage and from the beginning of the W stage into the beginning of the X stage.
Another required optimization is to predict all branches to be "not taken". This essentially means that your pipeline should continue to execute sequentially until the branch resolves, and then "squash" instructions after the branch if the branch was actually taken.
Remember: Making your design work correctly is the most important
thing for your grade. Optimization is to be done afterward.