Main »

Homework 4

Tasks

edit SideBar

Homework 4

Due 03/19
Weight: 15%

Use this cover-sheet as the first page of your homework. Download the word doc, fill your name and print. Or in hand write the details in big big letters. word doc, [pdf

1.  Important

  • Read Verilog rules check in the Tools page. Your program must pass Vcheck.
  • Review The elements of Logic Design Style
  • Homework is due at start of class
  • Problems 1, 2 and 3 MUST be done with your project partner and must be submitted to the group dropbox.
  • Problems 4, 5, 6, 7, 8, 9, 10, 11 and 12 MUST be done ALONE.
  • Problem 12 must be submitted online alone to the individual dropbox.
  • Both partners must turn in written copies for all the problems.
  • You must abide by the Verilog file naming conventions
  • Verilog problems 1, 2, 3 are 15 points each.
  • Problem 12 carries 15 points as well.
  • Problems 4, 5, 6, 7, 8, 9, 10, 11 carry 10 points each.

2.  Grading scheme (total- 100)

  • Problems 1-3 (total- 45):
    • Points for verilog part of each problem- 10
    • Points for written part of each problem- 5
    • Total for each problem- 15
  • Problems 12 (total- 15):
    • Points for submitting .asm files- 10
    • Points for written part - 5
  • Problems 4-11 (total- 40):
    • 4 (random) out of the 8 problems will be graded
    • Points for each graded problem- 10

3.  Problem 1

In Verilog, create a register file that includes internal bypassing so that results written in one cycle can be read during the same cycle. Do this by writing an outer "wrapper" module that instantiates your existing (unchanged) register file module; your new module will just add the bypass logic. The list of inputs and outputs of the outer module should be the same as that of the inner module. Submit your Verilog source and your testing results.

  • Call this module rf_bypass and use the template rf_bypass.v with *NO* modifications in the interface
  • Use rf_bypass_hier.v with *NO* modifications to instantiate rf_bypass
  • For verification, write a testbench rf_bypass_hier_bench.v which instantiates rf_bypass_hier.
  • Turn in all verilog files including rf.v and others which were already submitted for HW3

What to submit:

  1. Physical copy
    1. Turn in neatly and legibly drawn schematics of your design. Represent your HW3 register as a module and show any modifications/additions.
    2. Annotated simulation trace of the complete design. Pick representative cases for your simulation input to turn in.
  2. Electronic submission instructions
    1. All verilog files
    2. Submission Instructions

4.  Problem 2

Read the synthesis tutorial on the Synthesis page.

Synthesize your register file from homework 3

Synthesize will create the synth directory which will include rf.syn.v, area report, timing report, etc. Make sure that in the area report no cell has an area of zero

What to submit:

  1. Physical copy

On the handwritten homework you turn in, fill in the following:

  1. Total area:
  2. Worst case slack
  1. Electronic submission
    1. Submit all verilog files, synthesized file 'rf.syn.v' and outputs area_report.txt and timing_report.txt
    2. Submission Instructions

5.  Problem 3

Read the synthesis tutorial on the Synthesis page.

Synthesize your FIFO from homework 3.

Synthesize will create the synth directory which will include fifo.syn.v, area report, timing report, etc. Make sure that in the area report no cell has an area of zero

What to submit:

  1. Physical copy

On the handwritten homework you turn in, fill in the following:

  1. Total area:
  2. Worst case slack
  1. Electronic submission
    1. Submit all verilog files, synthesized 'fifo.syn.v' and outputs area_report.txt and timing_report.txt
    2. Submission Instructions

6.  Problem 4

Indicate all of the true, anti-, and output-dependences in the following segment of MIPS assembly code:

    xor    $1, $2, $3
    and    $4, $5, $6
    sub    $7, $4, $5
    add    $5, $1, $5
    sw     $4, 100($7)
    or     $4, $7, $4 

For the code above, which of the dependences will manifest themselves as hazards in the pipeline in Figure 4.41 on page 355 of COD4e? How are these hazards resolved in this pipeline? Assuming the 'xor' instruction enters fetch (F) in cycle 1, in what cycle does the 'or' instruction enter writeback (W)? Show your work in a pipeline diagram. (Assume that the register file cannot read and write the same register in the same cycle and get the new data.)

How does your answer change if you consider the pipeline in 4.60, on page 375 of COD4e? (Assume that the register file contains internal bypassing and can read and write the same register in the same cycle and get the new data.)


7.  Problem 5

Consider the following assembly program to be executed in a MIPS ISA 5-stage(F,D,X,M,W) pipelined data path given in figure 4.51 on page 362 of COD4e:

    I1: add $3,$4,$6
    I2: sub $5,$3,$2
    I3: lw $6,100($5)
    I4: add $5,$6,$3

a) Identify every occurrence and every types of data dependencies True(RAW), Anti(WAR), Output(WAW) in the above problem. Also, indicate which register is involved in that data dependency.

b) If this program is to be executed in a pipelined data path, create a pipeline timing diagram table(clock cycle numbers as column and instructions as rows)assuming NO forwarding, except that register forwarding is available.

c) Identify all the data hazards that may occur as applicable. For each hazard, indicate whether data forwarding(including register forwarding) may be applied to eliminate that hazard. For each hazard, give the two instructions involved, the register involved, and the pipeline register(IF/ID, ID/EX, EX/MEM, MEM/WB)whose output will be used for data forwarding.


8.  Problem 6

Consider the following program code:

    lw  $s1, 8($s0)
    sub $s0,$s1,$S2 
    add $s0,$s0,$s1

If the above program is to be executed in a pipelined datapath given in figure 4.51 on page 362 of COD4e equipped with full data forwarding (as well as register forwarding), complete the timing diagram table(clock cycle numbers as column and instructions as rows). Also mark the clock cycle when a data forwarding(F) takes place or a pipeline stall(S) is inserted.


9.  Problem 7

Consider the following code sequence and the datapath in figure 4.51 on page 362 of COD4e. Assuming the first instruction is fetched in cycle 1 and the branch is not taken, in which cycle does the 'and' instruction write its value to the register file? What if the branch IS taken? (Assume no branch prediction). Show pipeline diagrams.


            beq    $2, $3, foo
            add    $3, $4, $5
            sub    $5, $6, $7
            or     $7, $8, $9
    foo:    and    $5, $6, $7 


10.  Problem 8

Consider the pipeline in Figure 4.51 on page 362; assume predict-not-taken for branches and assume a "Hazard detection unit" in the ID stage as shown on page 379. Can an attempt to flush and an attempt to stall occur simultaneously? If so, do they result in conflicting actions and/or cooperating actions? If there are any cooperating actions, how do they work together? If there are any conflicting actions, which should take priority? What would you do in the design to make sure this works correctly? You may want to consider the following code sequence to help you answer this question:


        beq $1, $2, TARGET  #assume that the branch is taken
        lw  $3, 40($4)
        add $2, $3, $4
        sw  $2, 40($4)
TARGET: or  $1, $1, $2


11.  Problem 9

Consider the following MIPS assemble code segment:


         bne $s1,$s2,LABEL  // $s1 != $s2
         add $t2,$t1,$s1
         sw $t2,4($s1)
         j EXIT
  LABEL: lw $s1,4($s6)
  EXIT:  addi $s1,$s1,4

Assume this code segment on a pipelined data path with data forwarding depicted in figure 4.65 on page 384 of COD4e where the branch decision is made in ID stage.

Assuming $s1 != $s2, a control hazard will occur. Provide a timing diagram table (clock cycle numbers as column and instructions as rows), to show which instructions are running at which phase (F,D,X,M,W)at each clock cycle. If an instruction is flushed from the pipeline, then the remaining phases should not appear. If an instruction is stalled for one cycle, then the remaining phases will be pushed back by one cycle. Indicate on the clock cycle and corresponding instruction for any flush or stall action. (No branch predictors are used in this problem).


12.  Problem 10

During the execution of a program, conditional branches have been executed 15 times. The traces of TAKEN(T) and NOT-TAKEN(N) of each branch instruction are listed below:

T-T-N-T-T-T-N-T-N-T-T-N-N-N-T

a) Prediction accuracy for "always NOT TAKEN" =

b) Prediction accuracy for "1 - bit predictor" =

   Indicate output of predictor for each instruction traced. Outcome = 1 if correct, and 0 if incorrect.

c) Prediction accuracy for "2 - bit predictor" =

   Indicate output of predictor for each instruction traced. Outcome = 1 if correct, and 0 if incorrect.

Note: For dynamic predictors (1 bit and 2 bit), assume the first predicted entry as TAKEN (T) and then proceed.


13.  Problem 11

High performance datapaths use bypass paths (also known as data forwarding logic) to reduce pipeline stalls. However, bypass paths are relatively expensive, especially in some wire constrained technologies. To reduce the cost (and potential cycle time impact), some architects have explored omitting some of the possible bypass paths. Consider the datapath illustrated below (note that the PC update logic and all control logic is intentionally omitted). This pipelined datapath is similar to the one in the book, but only has bypass paths on one side of the ALU. Assume that the register file intentionally bypasses the value, so that if register Si is read and written in the same cycle, then the read returns the new value. Assume that the control logic bypasses the data as soon as possible using the given forwarding data paths, and stalls in decode otherwise. You may NOT add additional data paths.

In this problem, you will look at how a program snippet performs on this pipleline. Recall that R-format instructions have the form: opcode rd, rs, rt

and I-format instructions have the form: opcode rt, imm(rs) or opcode rt, rs, imm

Use the table given below to show how the given instruction sequence flows through the pipeline and where stalls are necessary to resolve hazards.

Timing Table
Pipeline

Consider the code and pipeline above. Show the execution of this code on the pipeline above. Use the letters, F, D, X, M, and W.

For each cycle where a stall occurs explain why ?


14.  Problem 12

Develop instruction level tests for your processor. In this problem each one of you will develop a set of small programs that are meant to test whether your processor implements these instructions correctly. You will write these programs in assembly, run them on an instruction emulator to make sure what you wrote is indeed testing the right thing. The eventual goal is to run these programs on your processor's verilog implementation and use them to test your implementation.

Each person will be responsible for one instruction (along with common instructions jal, jalr) and must develop a set of simple programs for that instructions. The table below gives the assignment of instructions to individual student.

adam hart		 add 
adam sperling		 addi
alexander larson	 andn
dmitri svetlov		 andni
anjali narayan-chen	 beqz
anthony bublitz		 bgez
benjamin li		 bltz
benjamin moench		 bnez
chandru loganathan	 btr
chen benhamo		 halt
christopher beley	 j
creighton long		 jr
daohang shi		 lbi
drew bollinger		 ld
eric dahl		 rol
fan zhu		         roli
gabriel bautista	 ror
garret handel		 rori
gregory belmonte	 sco
hongkai pan		 seq
jacob hanshaw		 slbi
jacob riley		 sle
james sawicki		 sll
john griffin		 slt
john vennard		 slli
johnathon ender		 srl
jonathan goetz		 srli
joseph peterson		 st
joshua tabor		 stu
junjue wang		 sub
justin krosschell	 subi
justin russo		 xor
kaashyapee jha		 xori
kah lee		         add 
kaushik kannan		 addi
kevin blair		 andn
kit shawn chew		 andni
lalit jain		 beqz
lee stratman		 bgez
louis schultz		 bltz
matthew stilin		 bnez
mengyu yang		 btr
mikkel nielsen		 halt
nathan little		 j
nathaniel williams	 jr
neil van lysel		 lbi
nicholas ambur		 ld
nicholas pjevach	 rol
paul mcbride		 roli
peng liu		 ror
rachel underwood	 rori
raghav mohan		 sco
rajan shah		 seq
rajsekhar venkat         slbi
ranjini mysore nagaraju	 sle
samuel roth		 sll
samuel solovy		 slt
sharath prasad		 slli
shiqin yan		 srl
song bian		 srli
stephanie dewet		 st
steve rossman		 stu
steven gross		 sub
taylor johnston		 subi
tyler bream		 xor
xiufeng xie		 xori
yuan yuan		 add 
yueyang chu		 addi
yujie wu		 andn
yulin shen		 andni
zheng ling		 beqz
zhengtao gong		 bgez
zhenhong liu		 bltz
zhexuan liu		 bnez
zhong xie		 btr



Common instructions to all students - jal, jalr     

To get you started below are two example tests for the add instruction.

add_0.asm


lbi r1, 255
lbi r2, 255
add r3, r1, r2
halt

add_1.asm

lbi r1, 255
lbi r2, 0
add r3, r1, r2
halt

You will notice one thing. The add test uses the lbi instruction also! Your goal while writing these tests is to isolate your instruction as much as possible and minimize the use of the other instructions. Identify different corner cases and the common case for your instruction and develop a set of simple test programs.

The work flow we will follow is:

  1. Write test in WISC-SP13 assembly language.
  2. Assemble using assembler assemble.sh
  3. Simulate the test in the simulator and make sure your test is doing what you thought it was doing. Use the simulator: wisccalculator

Read the following two documents on how to use to assembler and simulator:

Below is a short demo:

prompt% assemble.sh add_0.asm
Created the following files
loadfile_0.img  loadfile_1.img  loadfile_2.img  loadfile_3.img  loadfile_all.img  loadfile.lst

prompt% wiscalculator loadfile_all.img

WISCalculator v1.0
Author Derek Hower (drh5@cs.wisc.edu)
Type "help" for more information

Loading program...
Executing...
lbi r1, -1
PC: 0x0002 EPC 0x0000R0 0x0000 R1 0xffff R2 0x0000 R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
lbi r2, -1
PC: 0x0004 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
add r3, r1, r2
PC: 0x0006 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
program halted
PC: 0x0008 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
Program Finished

prompt%

The simulator will print a trace of each instruction along with the state of the registers. You should examine these to make sure that your test is indeed doing what is expected. For the st instruction you will need to examine memory also.

What you need to do:

  • Write a set of tests for your instruction. Name them <instruction>_[0,1,2,3,4].asm
  • Use your discretion to decide how many tests you need
  • Identify corner cases. Think about possible bugs in the hardware.
  • In addition to your assigned instruction, everyone must write tests for the jal and jalr instruction
  • Write comments in your assembly code explain what the test is doing
  • The goal of this problem is to make sure you understand the ISA and develop targeted tests for the hardware. Understanding the ISA is required before building hardware for it!

I will make all tests available to everyone, so you can use these to debug and test your verilog implementation as we near the demo 1 deadline. One of the first things, you must do after putting together your full processor is run each of these tests and test each individual instruction.

What to submit

  1. Physical copy
    1. Written explanation of what your tests do and justification why your set of tests is comprehensive.
  2. Electronic submission copy
    1. All .asm files
    2. Submission Instructions

Page last modified on March 29, 2013, visited 1706 times

Edit - History - Print - Recent Changes (All) - Search