Main »

Homework 4

Tasks

edit SideBar

Homework 4

Due 03/27
Weight: 15%

Important


Problem 1

In Verilog, create a register file that includes internal bypassing so that results written in one cycle can be read during the same cycle. Do this by writing an outer "wrapper" module that instantiates your existing (unchanged) register file module; your new module will just add the bypass logic. The list of inputs and outputs of the outer module should be the same as that of the inner module. Submit your Verilog source and your testing results.

Use the exact same template as the one used in homework 3.

What to submit:

  1. Turn in neatly and legibly drawn schematics of your design. Represent

your HW3 register as a module and show any modifications/additions.

  1. Annotated simulation trace of the complete design. Pick representative cases for your simulation input to turn in. Your trace must show the following:
  2. Any modifications to the testbench if required. If you use the testbench provided, electronically submit the text output of the program as rf_bench.out (see 4 below). Modelsim will write the text output to a file called transcript in your project directory.
  3. Electronically submit your verilog source code. All of your source code must be in one tgz called hw4-p1.tgz. Vcheck output must be included in the tgz.

Problem 2

Indicate all of the true, anti-, and output-dependences in the following segment of MIPS assembly code:

    xor    $1, $2, $3
    and    $4, $5, $6
    sub    $7, $4, $5
    add    $5, $1, $5
    sw     $4, 100($7)
    or     $4, $7, $4 

For the code above, which of the dependences will manifest themselves as hazards in the pipeline in Figure 6.17 on page 395 of COD3e? How are these hazards resolved in this pipeline? Assuming the 'xor' instruction enters fetch (F) in cycle 1, in what cycle does the 'or' instruction enter writeback (W)? Show your work in a pipeline diagram. (Assume that the register file cannot read and write the same register in the same cycle and get the new data.)

How does your answer change if you consider the pipeline in figure 6.36, on page 416 of COD3e? (Assume that the register file contains internal bypassing and can read and write the same register in the same cycle and get the new data.)

Problem 3

Consider the following code sequence and the datapath in figure 6.27 on page 404 of COD3e. Assuming the first instruction is fetched in cycle 1 and the branch is not taken, in which cycle does the 'and' instruction write its value to the register file? What if the branch IS taken? (Assume no branch prediction). Show pipeline diagrams.


            beq    $2, $3, foo
            add    $3, $4, $5
            sub    $5, $6, $7
            or     $7, $8, $9
    foo:    and    $5, $6, $7 


Problem 4

Consider the pipeline in Figure 6.27 on page 404; assume predict-not-taken for branches and assume a "Hazard detection unit" in the ID stage as shown on page 461. Can an attempt to flush and an attempt to stall occur simultaneously? If so, do they result in conflicting actions and/or cooperating actions? If there are any cooperating actions, how do they work together? If there are an conflicting actions, which should take priority? What would you do in the design to make sure this works correctly? You may want to consider the following code sequence to help you answer this question:


        beq $1, $2, TARGET  #assume that the branch is taken
        lw  $3, 40($4)
        add $2, $3, $4
        sw  $2, 40($4)
TARGET: or  $1, $1, $2



Problem 5

Consider a pipeline where branches are predicted not-taken, and a taken branch introduces three-cycle penalty. Suppose you are considering adding a delayed branch slot to your instruction set architecture, so that taken branches would only have a two-cycle penalty. Consider the following three fragments of code:


Fragment 1:

        movei $3, 10  // End-of-loop count
        .
        .
        .
        addi $2, $2, 1
        beq $2, $3, Target
        .
        .
        .
Target: ...

Fragment 2:

        add $2, $2, $8
        beq $2, $3, Target
        lw $4, 0($7)
        .
        .
        .
Target: sub $4, $5, $6
        ...

Fragment 3:

        add $2, $2, $8
        beq $2, $3, Target
        lw $4, 0($8)
        .
        .
        .
Target: lw $5, 0($7)
        ...

Re-arrange or re-write each of the fragments so that it will work correctly with a branch delay slot and maximize performance. (The dots represent an unknown amount of other code that you can't change.) What is the average number of cycles that were saved or lost in each case if you used the delayed branch architecture? (Assume branches are taken 60% of the time.)

While a good idea at the time, branch delay slots are discouraged in modern processors with deep pipelines in favor of dynamic branch predictors. Why do you think this is so? Why would a branch delay instruction perform poorly in a long pipeline?


Problem 6

None. Moved to HW5. Leave problem 6 empty in your submitted homework.


Problem 7

Develop instruction level tests for your processor. In this problem each of you will develop a set of small programs that are meant to test whether your processor implements these instructions correctly. You will write these programs in assembly, run them on an instruction emulator to make sure what you wrote is indeed testing the write thing. The eventual goal is to run these programs on your processor's verilog implementation and use them to test your implementation.

Each of you will be responsible for one instruction and must develop a set of simple programs for that instructions. The table below gives the assignment of instructions to students.


nobody      halt
nobody      nop

abhishek    addi
aidan       subi
akumar      xori
bemis       andni
bohl        roli
chue        slli
diep        rori
duwe-iii    srli
flanagan    st
fortin      ld
fridlund    stu
hollinge    btr
janaki      add
janderso    sub
jbyrne      xor
jhugo       andn
ju          rol
kirti       sll
luancong    ror
mcjunkin    srl
mckinley    seq
pulliam     slt
richard     sle
roberto     sco
ruhland     beqz
shan-hsi    bnez
simanek     bltz
sinclair    bgez
usha        lbi
vaishali    slbi
vviswana    j
waclawik    jr
all         jal
all         jalr


To get you started below are two example tests for the add instruction.

add_0.asm


lbi r1, 255
lbi r2, 255
add r3, r1, r2
halt

add_1.asm

lbi r1, 255
lbi r2, 0
add r3, r1, r2
halt

You will notice one thing. The add test uses the lbi instruction also! Your goal while writing these tests is to isolate your instruction as much as possible and minimize the use of the other instructions. Identify different corner cases and the common case for your instruction and develop a set of simple test programs.

The work flow we will follow is:

  1. Write test in WISC-SP08 assembly language.
  2. Assemble using assembler assemble.sh
  3. Simulate the test in the simulator and make sure your test is doing what you thought it was doing. Use the simulator: wisccalculator

Read the following two documents on how to use to assembler and simulator:

Below is a short demo:

prompt% assemble.sh add_0.asm
Created the following files
loadfile_0.img  loadfile_1.img  loadfile_2.img  loadfile_3.img  loadfile_all.img  loadfile.lst

prompt% wiscalculator loadfile_all.img

WISCalculator v1.0
Author Derek Hower (drh5@cs.wisc.edu)
Type "help" for more information

Loading program...
Executing...
lbi r1, -1
PC: 0x0002 EPC 0x0000R0 0x0000 R1 0xffff R2 0x0000 R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
lbi r2, -1
PC: 0x0004 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
add r3, r1, r2
PC: 0x0006 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
program halted
PC: 0x0008 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
Program Finished

prompt%

The simulator will print a trace of each instruction along with the state of the registers. You should examine these to make sure that your test is indeed doing what is expected. For the st instruction you will need to examine memory also.

What you need to do:

  • Write a set of tests for your instruction. Name them <opcode>_[0,1,2,3,4].asm
  • Use your discretion to decide how many tests you need
  • Identify corner cases. Think about possible bugs in the hardware.
  • In addition to your assigned instruction, everyone must write tests for the jal and jalr instruction
  • Write comments in your assembly code explain what the test is doing
  • The goal of this problem is to make sure you understand the ISA and develop targeted tests for the hardware. Understanding the ISA is required before building hardware for it!

I will make all tests available to everyone, so you can use these to debug and test your verilog implementation as we near the demo 1 deadline. One of the first things, you must do after putting together your full processor is run each of these tests and test each individual instruction.

What to submit

  • A tgz of all the tests you wrote in a single tgz called hw4-p7.tgz. This file must contain ONLY the .asm files
  • Written explanation of what your tests do and justification why your set of tests is comprehensive
  • This problem alone is due 03/15. However I will accept late homework until 03/27 for this problem. You are highly encouraged to finish this problem by 03/15. This will benefit everyone in class.

Page last modified on March 17, 2008

Edit - History - Print - Recent Changes (All) - Search