| CS552 Course Wiki: Spring 2008 | Main »
Homework 4 |
Tasks |
Homework 4 Due 03/27 Important
Problem 1 In Verilog, create a register file that includes internal bypassing so that results written in one cycle can be read during the same cycle. Do this by writing an outer "wrapper" module that instantiates your existing (unchanged) register file module; your new module will just add the bypass logic. The list of inputs and outputs of the outer module should be the same as that of the inner module. Submit your Verilog source and your testing results. Use the exact same template as the one used in homework 3. What to submit:
your HW3 register as a module and show any modifications/additions.
Problem 2 Indicate all of the true, anti-, and output-dependences in the following segment of MIPS assembly code:
xor $1, $2, $3
and $4, $5, $6
sub $7, $4, $5
add $5, $1, $5
sw $4, 100($7)
or $4, $7, $4
For the code above, which of the dependences will manifest themselves as hazards in the pipeline in Figure 6.17 on page 395 of COD3e? How are these hazards resolved in this pipeline? Assuming the 'xor' instruction enters fetch (F) in cycle 1, in what cycle does the 'or' instruction enter writeback (W)? Show your work in a pipeline diagram. (Assume that the register file cannot read and write the same register in the same cycle and get the new data.) How does your answer change if you consider the pipeline in figure 6.36, on page 416 of COD3e? (Assume that the register file contains internal bypassing and can read and write the same register in the same cycle and get the new data.) Problem 3 Consider the following code sequence and the datapath in figure 6.27 on page 404 of COD3e. Assuming the first instruction is fetched in cycle 1 and the branch is not taken, in which cycle does the 'and' instruction write its value to the register file? What if the branch IS taken? (Assume no branch prediction). Show pipeline diagrams.
beq $2, $3, foo
add $3, $4, $5
sub $5, $6, $7
or $7, $8, $9
foo: and $5, $6, $7
Problem 4 Consider the pipeline in Figure 6.27 on page 404; assume predict-not-taken for branches and assume a "Hazard detection unit" in the ID stage as shown on page 461. Can an attempt to flush and an attempt to stall occur simultaneously? If so, do they result in conflicting actions and/or cooperating actions? If there are any cooperating actions, how do they work together? If there are an conflicting actions, which should take priority? What would you do in the design to make sure this works correctly? You may want to consider the following code sequence to help you answer this question:
beq $1, $2, TARGET #assume that the branch is taken
lw $3, 40($4)
add $2, $3, $4
sw $2, 40($4)
TARGET: or $1, $1, $2
Problem 5 Consider a pipeline where branches are predicted not-taken, and a taken branch introduces three-cycle penalty. Suppose you are considering adding a delayed branch slot to your instruction set architecture, so that taken branches would only have a two-cycle penalty. Consider the following three fragments of code:
Fragment 1:
movei $3, 10 // End-of-loop count
.
.
.
addi $2, $2, 1
beq $2, $3, Target
.
.
.
Target: ...
Fragment 2:
add $2, $2, $8
beq $2, $3, Target
lw $4, 0($7)
.
.
.
Target: sub $4, $5, $6
...
Fragment 3:
add $2, $2, $8
beq $2, $3, Target
lw $4, 0($8)
.
.
.
Target: lw $5, 0($7)
...
Re-arrange or re-write each of the fragments so that it will work correctly with a branch delay slot and maximize performance. (The dots represent an unknown amount of other code that you can't change.) What is the average number of cycles that were saved or lost in each case if you used the delayed branch architecture? (Assume branches are taken 60% of the time.) While a good idea at the time, branch delay slots are discouraged in modern processors with deep pipelines in favor of dynamic branch predictors. Why do you think this is so? Why would a branch delay instruction perform poorly in a long pipeline? Problem 6 None. Moved to HW5. Leave problem 6 empty in your submitted homework. Develop instruction level tests for your processor. In this problem each of you will develop a set of small programs that are meant to test whether your processor implements these instructions correctly. You will write these programs in assembly, run them on an instruction emulator to make sure what you wrote is indeed testing the write thing. The eventual goal is to run these programs on your processor's verilog implementation and use them to test your implementation. Each of you will be responsible for one instruction and must develop a set of simple programs for that instructions. The table below gives the assignment of instructions to students. nobody halt nobody nop abhishek addi aidan subi akumar xori bemis andni bohl roli chue slli diep rori duwe-iii srli flanagan st fortin ld fridlund stu hollinge btr janaki add janderso sub jbyrne xor jhugo andn ju rol kirti sll luancong ror mcjunkin srl mckinley seq pulliam slt richard sle roberto sco ruhland beqz shan-hsi bnez simanek bltz sinclair bgez usha lbi vaishali slbi vviswana j waclawik jr all jal all jalr To get you started below are two example tests for the add instruction. add_0.asm lbi r1, 255 lbi r2, 255 add r3, r1, r2 halt add_1.asm lbi r1, 255 lbi r2, 0 add r3, r1, r2 halt You will notice one thing. The The work flow we will follow is:
Read the following two documents on how to use to assembler and simulator:
Below is a short demo: prompt% assemble.sh add_0.asm Created the following files loadfile_0.img loadfile_1.img loadfile_2.img loadfile_3.img loadfile_all.img loadfile.lst prompt% wiscalculator loadfile_all.img WISCalculator v1.0 Author Derek Hower (drh5@cs.wisc.edu) Type "help" for more information Loading program... Executing... lbi r1, -1 PC: 0x0002 EPC 0x0000R0 0x0000 R1 0xffff R2 0x0000 R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000 lbi r2, -1 PC: 0x0004 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000 add r3, r1, r2 PC: 0x0006 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000 program halted PC: 0x0008 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000 Program Finished prompt% The simulator will print a trace of each instruction along with the state of the registers. You should examine these to make sure that your test is indeed doing what is expected. For the What you need to do:
I will make all tests available to everyone, so you can use these to debug and test your verilog implementation as we near the demo 1 deadline. One of the first things, you must do after putting together your full processor is run each of these tests and test each individual instruction. What to submit
|
| Page last modified on March 17, 2008 |