- Homework is due at start of class 
- Problems 1 - 3 MUST be done with your project group    (all electronic:  handin to “hw4_1, hw4_2 and hw4_3”) 
- Problems 4 - 7 MUST be done ALONE                                  (all paper) 
- Problem 8 must also be done ALONE                                  (all electronic: handin to “TBA”) 
- No exceptions to the above handin rules will be allowed, as this is already unduly complicated (to grade). 
- You must abide by theVerilog file naming conventions 
- All verilog code must pass Vcheck 
		- Each problem must be in its own directory 
- If a problem requires files from a different directory, then create a copy of the file in each directory. 
	Problem
	1 - 10 Points
	
	In
	Verilog, create a register file that includes internal bypassing so
	that results written in one cycle can be read during the same cycle.
	Do this by writing an outer "wrapper" module that
	instantiates your existing (unchanged) register file module; your
	new module will just add the bypass logic. The list of inputs and
	outputs of the outer module should be the same as that of the inner
	module. Submit your Verilog source code and your testing results.
	
		- 
		Call
		this module rf_bypass and it should be in a file called rf_bypass.v 
- 
		Modify
		rf_hier.v from problem3 of HW2 so that it now instantiates rf_bypass
		instead of rf. 
- 
		The
		inputs and output interface for rf_bypass.v should be identical to
		rf.v 
- Use
		the rf_bypass_bench.v
		testbench.
		Here are some usage instructions: Usage
		instructions.
		
		 
	What
	to submit: (Directory name: hw4_1)
	
		- 
		Describe
		precisely how you augmented your hw3 register file in README.txt 
- Any
		modifications to the testbench if required. If you use the
		testbench provided, electronically submit the text output of the
		program as rf_bench.out (see 4 below). Modelsim will write the text
		output to a file called - transcriptin
		your project directory.
 
- 
		All
		your verilog source code. 
	
	
	
	Problem
	2 – 10 Points
	Read the Synthesis Tutorial
	Synthesize
	your register file from homework
	3
	
	Synthesis
	will create the synth directory which will include rf.syn.v, area
	report, timing report, etc.
	
	What
	to submit: (Directory name: hw4_2)
	
		- 
		Verilog
		files from hw3's register file 
- 
		Add
		the entire synth directory 
- 
		Make
		sure rf.syn.v, and the 4 report files are present (Make sure that
		in the area report no cell has an area of zero) 
- 
		In
		the readme, fill in this info: - 
			- 
			Total
			area 
- 
			Worst
			case slack 
 
	
	
	
	
	
	Problem
	3 – 10 Points
	
	
Read the Synthesis Tutorial
	Synthesize
	your FIFO from homework 3.
	
	Synthesize
	will create the synth directory which will include fifo.syn.v, area
	report, timing report, etc.
	
	What
	to submit: (Directory name: hw4_3)
	
		- 
		Verilog
		files from hw3's fifo 
- 
		Add
		the entire synth directory 
- 
		Make
		sure fifo.syn.v, and the 4 report files are present (Make sure that
		in the area report no cell has an area of zero) 
- 
		In
		the readme, fill in this info: - 
			- 
			Total
			area 
- 
			Worst
			case slacklack 
 
	
	
	
	
	#end
	group work#
	
	Problem
	4 – 15 Points
	
	Consider
	the following code sequence and the datapath in figure 4.51 on page
	362 of COD4e. Assuming the first instruction is fetched in cycle 1
	and the branch is not taken, in which cycle does the 'and'
	instruction write its value to the register file? What if the branch
	IS taken? (Assume no branch prediction). Show pipeline diagrams.
	          beq    $2, $3, foo
          add     $3, $4, $5
          sub     $5, $6, $7
          or      $7, $8, $9
    foo: and    $5, $6, $7 
	
	Problem
	5 – 15 Points
	
	For each of the three
	MIPS assembly code segments, (a) indicate the dependences and their types, (b) Assuming that there is NO
	forwarding in the pipelined processor, indicate hazards and add NOP instructions to eliminate them, (c) Assuming
	thre is FULL forwarding in the pipelined processor, indicate hazards and add NOP instructions to eliminate
	them.
	(a)          add    $4, $4, $2
             sub     $5, $3, $1
             lw      $6, 200($3)
             add     $7, $3, $6
	(b)          lw     $1, 40($6)
             add     $6, $2, $2
             sw      $6, 50($1)
	  (c)          lw     $5, -16($5)
             sw      $5, -16($5)
            add      $5, $5, $5
	
	
	Problem
	6 – 10 Points
	
	COD4E - EXCERCISE 4.24.1 - 4.24.3 (PAGE 432) with changes below
	(a) Take pattern as T, T, NT, T and (b) Take pattern as T, T, T, NT, NT
	
	
	Problem
	7 – 15 Points
	
	Consider
	a pipeline where branches are predicted not-taken, and a taken
	branch introduces three-cycle penalty. Suppose you are considering
	adding a delayed branch slot to your instruction set architecture,
	so that taken branches would only have a two-cycle penalty. Consider
	the following three fragments of code:
	Fragment 1:
        add $5, $5, $2
        beq $5, $6, Target
        lw $4, 0($2)
        .
        .
        .
Target: lw $1, 0($7)
        ...
Fragment 2:
        add $5, $5, $2
        beq $5, $6, Target
        lw $4, 0($7)
        .
        .
        .
Target: sub $4, $8, $3
        ...
Fragment 3:
        movei $2, 21  // End-of-loop count
        .
        .
        .
        addi $4, $4, 1
        beq $4, $2, Target
        .
        .
        .
Target: ...
	Re-arrange
	or re-write each of the fragments so that it will work correctly
	with a branch delay slot and maximize performance. (The dots
	represent an unknown amount of other code that you can't change.)
	What is the average number of cycles that were saved or lost in each
	case if you used the delayed branch architecture? (Assume branches
	are taken 60% of the time.)
	
	While
	a good idea at the time, branch delay slots are discouraged in
	modern processors with deep pipelines in favor of dynamic branch
	predictors. Why do you think this is so? Why would a branch delay
	instruction perform poorly in a long pipeline?
	
	Problem
	8 – 15 Points
	
	
	
	
	Develop
	instruction level tests for your processor. In this problem each of
	you will develop a set of small programs that are meant to test
	whether your processor implements these instructions correctly. You
	will write these programs in assembly, run them on an instruction
	emulator to make sure what you wrote is indeed testing the write
	thing. The eventual goal is to run these programs on your
	processor's verilog implementation and use them to test your
	implementation.
	Each
	of you will be responsible for one instruction and must develop a
	set of simple programs for that instructions. The table below gives
	the assignment of instructions to students.
	
		
		
		
			| ahmad | addi | 
		
			| alexm | subi | 
		
			| ampomah | ori | 
		
			| andracek | andi | 
		
			| ativut | roli | 
		
			| bayer | bltz | 
		
			| chao | rori | 
		
			| chunw | srai | 
		
			| danielr | st | 
		
			| davidm | ld | 
		
			| deblon | stu | 
		
			| dexter | add | 
		
			| dimitrio | sub | 
		
			| dmiller | or | 
		
			| dragga | and | 
		
			| eichers | rol | 
		
			| fessler | sll | 
		
			| fisher | ror | 
		
			| foss | sra | 
		
			| gola | seq | 
		
			| harter | slt | 
		
			| harwell | sle | 
		
			| hittson | sco | 
		
			| hongzhuo | beqz | 
		
			| huanchen | bnez | 
		
			| ishani | lbi | 
		
			| jaffke | slbi | 
		
			| jalal | j | 
		
			| jiaduo | jr | 
		
			| jitrapon | jal | 
		
			| jliu | jalr | 
		
			| jui-chie | sll | 
		
			| justmann | slt | 
		
			| katelyn | slli | 
		
			| kolp | bnez | 
		
			| kpark38 | ori | 
		
			| kulcyk | sle | 
		
			| lars | bgez | 
		
			| little | srai | 
		
			| mcc | st | 
		
			| mgm | ror | 
		
			| mschmid | roli | 
		
			| nwilliam | jalr | 
		
			| parvi | addi | 
		
			| pjohnson | subi | 
		
			| redderse | add | 
		
			| roberts | sub | 
		
			| sato | jalr | 
		
			| schleife | beqz | 
		
			| shanpeng | bnez | 
		
			| shubham | sco | 
		
			| skobov | sle | 
		
			| sok | seq | 
		
			| starr | slt | 
		
			| suli | sll | 
		
			| swiercze | sra | 
		
			| theodor | st | 
		
			| tong | btr | 
		
			| vander-p | srai | 
		
			| van-maas | roli | 
		
			| weisnich | and | 
		
			| wysocki | rori | 
		
			| yaman | sll | 
		
			| yashashr | bltz | 
		
			| zxie | btr | 
	
	
	To
	get you started below are two example tests for the add instruction.
	
	add_0.asm
	lbi r1, 255
lbi r2, 255
add r3, r1, r2
halt
	add_1.asm
	lbi r1, 255
lbi r2, 0
add r3, r1, r2
halt
	You
	will notice one thing. The add test
	uses the lbi instruction
	also! Your goal while writing these tests is to isolate your
	instruction as much as possible and minimize the use of the other
	instructions. Identify different corner cases and the common case
	for your instruction and develop a set of simple test programs.
	
	The
	work flow we will follow is:
	
		- 
		Write
		test in WISC-SP12 assembly language. 
- Assemble
		using assembler - assemble.sh
 
- Simulate
		the test in the simulator and make sure your test is doing what you
		thought it was doing. Use the simulator:- wisccalculator
 
	
	
	Below
	is a short demo:
	prompt% assemble.sh add_0.asm
Created the following files
loadfile_0.img  loadfile_1.img  loadfile_2.img  loadfile_3.img  loadfile_all.img  loadfile.lst
prompt% wiscalculator loadfile_all.img
WISCalculator v1.0
Author Derek Hower (drh5@cs.wisc.edu)
Type "help" for more information
Loading program...
Executing...
lbi r1, -1
PC: 0x0002 EPC 0x0000R0 0x0000 R1 0xffff R2 0x0000 R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
lbi r2, -1
PC: 0x0004 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0x0000 R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
add r3, r1, r2
PC: 0x0006 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
program halted
PC: 0x0008 EPC 0x0000R0 0x0000 R1 0xffff R2 0xffff R3 0xfffe R4 0x0000 R5 0x0000 R6 0x0000 R7 0x0000
Program Finished
prompt%
	The
	simulator will print a trace of each instruction along with the
	state of the registers. You should examine these to make sure that
	your test is indeed doing what is expected. For the st instruction
	you will need to examine memory also.
	
	What
	you need to do:
	
		- 
		Write
		a set of tests for your instruction. Name them
		<opcode>_[0,1,2,3,4].asm 
- 
		Use
		your discretion to decide how many tests you need 
- 
		Identify
		corner cases. Think about possible bugs in the hardware. 
- 
		Write
		comments in your assembly code explain what the test is doing 
- 
		The
		goal of this problem is to make sure you understand the ISA and
		develop targeted tests for the hardware. Understanding the ISA is
		required before building hardware for it! 
	I
	will make all tests available to everyone, so you can use these to
	debug and test your verilog implementation. One of the first things,
	you must do after putting together your full processor is run each
	of these tests and test each individual instruction.
	
	
	What to submit
	
- Physical copy
	
- Written explanation of what your tests do and justification why your set of tests is comprehensive
	
Electronic submission instructionsCreate a folder 
	named “prob_inst”:
	
		- 
		All 		your assembly files must be in this directory 
- 
		Write a set of tests for your instruction. Name them _[0,1,2,3,4].asm 
- 
		Use your discretion to decide how many tests you need  
- 
		Identify corner cases. Think about possible bugs in the hardware. 
- 
		Written
		explanation of what your tests do and justification why your set of
		tests is comprehensive (Copy paste the contents you prepare manually into a README.txt) 
Handin Instructions
Hand in your homework using the CS handin program.
- Make a folder for each problem (hw4_1, hw4_2, hw4_3 and prob_inst)
   
-  Each folder should contain all the verilog files for that problem. prob_inst should contain your .asm files
   
-  name and signals for the top level module should be as indicated for each problem
   
-  tar these 4 folders to 'cs username'.tar [example : tar cvf ram.tar hw4_1 hw4_2 hw4_3 prob_inst]
   
-  Copy the tar file over to an empty folder and submit it using handin documentation [example: mkdir ram; mv ram.tar ram]
 * <class_name> cs552-1
 * <assignment_name> HW4
 * <directory_path> 'location_of_the folder_you_created'