Main »

Course calendar and lecture notes

Tasks

edit SideBar

Important Notes

Deadlines and grading

Date Project
4-Feb Form project team (Feb 8th)
23-Feb Project plan
3-Mar Design Review
17-Mar Demo 1
12-Apr Demo 2
14-Apr Cache FSM turnin
21-Apr Cache Demo
9-May Final demo
10-May Final report

There are four major deadlines over the course of your term project design, which will be met in the form of project demos with the course TA and a final project report. During a demo, it is important that both team members posses a conceptual understanding of the entire design. Answers such as "I don't know, my partner did that" will not be acceptable. However, a response such as "I didn't implement that part of the design, but it works in the following way..." is perfectly fine.

Teams should be well prepared before showing up to a demo. Time is limited and your grade may be negatively impacted if the demo could not be completed. Be sure that the designs you hand in work without alteration in such a way that the TA could easily compile and simulate the design without special instructions.

1.  Rough Overview:

You can think of this project as having roughly six stages of development with several demos along the way.

  1. You will first build a single cycle non-pipelined processor with a highly idealized memory
  2. Your processor can then be pipelined into distinct stages but while still using a highly idealized memory
  3. The memory will then be transitioned to using a more realistic banked memory module that cannot respond to requests in a single cycle
  4. A cache can then be implemented that can be used to improve the now degraded memory performance
  5. Once the cache has been fully verified it can be incorporated into the full processor
  6. Optimizations can then be added for additional processor performance

2.  Form team:

The project is done in groups of two. These groups should be formed no later than February 19th. A google doc will be made available to specify your team.

3.  Project plan: (2% of project grade)

Each group needs to turn in a typed report (one to two page single-spaced) describing your project design and test plan. You are expected to develop a detailed schedule identifying key milestones and a breakdown of the tasks by project partner. Make sure that your schedule takes into account the remaining homework assignments and your other course obligations (e.g., midterms).

You must have thought about the design at the high level and partitioning of work between you and your partner. The plan you come up will be your master plan for the semester and you will be asked to update/revise the plan as we go along.

In addition to the design, you are expected to develop a detailed test plan, including high-level descriptions of component, module, and system tests. Include both project members names, email addresses, and team name on the report.

Look through the course calendar for the design-review, demo-1, demo-2, cache-demo, and final-demo dates and plan your work accordingly. These dates are non-negotiable and you must adhere to them. There will be a signup for a 15 minute meeting for design-review. Depending on how things shape up, we may do a signup and meetings for demo-1, demo-2, cache-demo and final-demo also.

Bring this report (printed) to class on the due date.

4.  Design review: (4% of project grade)

Each group needs to create a complete hand-drawn (or drawn with the aid of a graphing program like Openoffice draw) schematic of an unpipelined WISC-SP13 implementation. Each module, bus, and signal should be uniquely labeled. The schematic should be hierarchical so that the top level design contains only empty shells for each planned submodule. In general, there will be a one-to-one mapping of modules in your schematic to the modules you will eventually write in Verilog.

While explicitly drawing pipeline stages in the schematic is not required, you should still design with a pipeline in mind. It is a good idea to place modules near their final location in the pipelined design.

During the review, individual team members should be able to describe the datapath of any legal WISC-SP13 instruction using the schematic as a reference. Teams will also be expected to defend the design decisions that they make. You need to have thought through the control path and decode logic. Not necessary to have done a complete table of signals, but if you have such a table with the control signal values for every instruction, that would be great.

Signup instructions are posted. You should sign-up for a time-slot in the google doc. Write each partner's last name against a time-slot. If none works, discuss with your class mates about a possible swap. If you still cannot find a time-slot that works, email both the TAs and Karu.

Both partners are required to be present and both are expected to explain and answer questions about the whole design. Answering a question with: "I have no idea, my partner did that" is a failing answer. You must (at least) be able to answer: "My partner implemented that, but it works in the following way....".


5.  Demo #1 - Unpipelined design (14% of project grade)

All of the files you will need for the project are in a project tar file. You should download and untar this while getting started.

To start, you should do a single-cycle, non-pipelined implementation. Figure 4.24 on page 271 is a good place to start.

For this stage, you will use the Single cycle perfect memory. Since you will need to fetch instructions as well as read or write data in the cycle, use two memories -- one for instruction memory and one for data.

Your design should be running the full WISC-SP13 instruction set, except for the extra-credit instructions. It should use the single-cycle memory model. You should run vcheck and your files must all pass vcheck.

In the demo you will run a set of programs on your processor using the wsrun.pl script (check the verification and simulation page for more info), show that your processor works on the test programs (full list in Test Programs page). You should run the tests under the following three categories:

  1. Simple tests
  2. Complex tests
  3. Random tests for demo1
    1. rand_simple
    2. rand_complex
    3. rand_ctrl
    4. rand_mem

If you have more than two failures in the simple tests, you will automatically lose 75% of the demo1 grade.

Use the -list file to run each of the categories of test. When you run wsrun.pl with the -list option, it will generate a file called summary.log, which looks like below:

add_0.asm SUCCESS CPI:1.3 CYCLES:12 ICOUNT:9 IHITRATE: 0 DHITRATE: 0

add_1.asm SUCCESS CPI:1.7 CYCLES:7 ICOUNT:4 IHITRATE: 0 DHITRATE: 0

add_2.asm SUCCESS CPI:1.7 CYCLES:7 ICOUNT:4 IHITRATE: 0 DHITRATE: 0

SUCCESS means the test passed. Run all the categories and rename the summary.log files as shown below:

  1. Simple tests: simple.summary.log
  2. Complex tests: complex.summary.log
  3. Random tests for demo1
    1. rand_simple: rand_simple.summary.log
    2. rand_complex: rand_complex.summary.log
    3. rand_ctrl: rand_ctrl.summary.log
    4. rand_mem: rand_mem.summary.log

The log files MUST have the exact name. These are the log files produced by running wsrun.pl -list with the all.list file for each of those sets of benchmarks. You will have to rename summary.log manually into these names. If your handed in code does not follow this convention, it will not be accepted and you will receive a zero for this demo. If in doubt about what to submit, email the TA *before* the deadline and double-check.

You should do rigorous testing and verification and should try to have zero failures on the other categories. It is ok to have a very small number of failures - but for every failure you must know the reason. You will submit your design electronically, which will be graded automatically. The instructor will then schedule one-on-one appointments with teams that have exhibited a large number of failures.

In this demo, you must also synthesize your processor and submit the results of synthesis, including the area and timing reports. If there are any synthesis errors you will get ZERO for the entire demo1.

Everything due before class.

Electronic submission instructions

Submit a single demo1.tar containing the following directories [ tar cvf demo1.tar verilog summary synthesis ] These sub-directories will already exist if you do your work in the demo1 directory from the original tar that was provided.

  1. The sub-directories should contain the following files
    1. verilog/ containing all verilog files. Please copy over ALL necessary files, your processor should be able compile and run with files from this directory alone.
    2. synthesis/ containing area_report.txt, cell_report.txt, timing_report.txt.
    3. summary/ containing the 6 summary.log files.

If the summary.log files are missing, you will automatically get zero points.

5.1  Single-Cycle Memory Specification

Since your single-cycle design must fetch instructions as well as read or write data in the same cycle, you will want to use two instances of this memory -- one for data, and one for instructions.

Note: You should instantiate this memory module twice. One instance will serve as the instruction memory while the other will serve as the data memory. Note that the program binary should be loaded into both instances. This will indeed be done (without any additional effort from your side) if you use the same module definition for both instances

                      +-------------+ 
data_in[15:0] >-------|             |--------> data_out[15:0] 
   addr[15:0] >-------| 65536 word  | 
       enable >-------| by 8  bit   |
           wr >-------| memory      |
          clk >-------|             |
          rst >-------|             |
   createdump >-------|             |
                      +-------------+

During each cycle, the "enable" and "wr" inputs determine what function the memory will perform:

enablewrFunctiondata_out
0XNo operation0
10ReadM[addr]
11Write data_in0

During a read cycle, the data output will immediately reflect the contents of the address input and will change in a flow-through fashion if the address changes. For writes, the "wr", "addr", and "data_in" signals must be stable at the rising edge of the clock ("clk").

The memory is intialized from a file. The file name is "loadfile_all.img", but you may change that in the Verilog source to any file name you prefer. The file is loaded at the first rising edge of the clock during reset. The simulator will look for the file in the same location as your .v files (or the directory from which you run wsrun.pl. The file format is:

@0
12
12
12
12
where "@0" specifies a starting address of zero, and "12" represents any 2-digit hex number. Any number of lines may be specified, up to the size of the memory. The assembler will produce files in this format.

At the end of the simulation, the memory can produce a dumpfile so that you may determine what has been written to the memory. When "createdump" is asserted at the rising edge of the clock, the memory will create a file named "dumpfile" in the mentor directory. You may want to use the decode of the "halt" instruction to assert "createdump" for a single cycle.

When a dumpfile is created, it will contain locations zero through the highest address that has been modified with a write cycle (not the highest address loaded from the loadfile). The format is:

0000 1234
0001 1234
0002 1234
Examining the source file memory2c.v, several possible changes should be obvious. The names of the files may be changed. The format of the dumpfile may be changed by modifying the $fdisplay statement; the syntax is very similar to C's fprintf statement. The starting and ending addresses to dump may be modified in the "for" statement. The only thing that cannot be modified is the format of the loadfile; that is built-in.

When you have two copies of the memory, for instructions and data, you may want to let both memories load the same loadfile, but only have the data memory generate a dumpfile.

The way to load programs for your processor is to use the assembler, create the memory dump. Name the memory dump, loadfile_all.img and copy this into the directory where memory2c.v is present.


6.  Demo #2.0 - Pipelined design with Perfect Memory (30% of project grade)

At this point, the pipelined version of your design needs to be running correctly, but no optimizations are needed yet. Correctly means that it must detect and do the right thing on pipeline hazards (e.g., stall). You will still use the single-cycle memory model. We will follow similar protocol as demo1. I will run your tests and ask teams with any failures to signup for a demo with me.

In this demo also, you must also synthesize your processor and submit the results of synthesis, including the area and timing reports. If there are any synthesis errors you will get ZERO for the entire demo1.

We recommend that you write at least two tests additional hand tests to test pipelining. Writing more will help simplify debugging. If you write additional tests, include them in verification/mytests/.

You must create and submit a document which should give an explanation of the behavior of your processor for the perf-test-dep-ldst.asm test. Please use the following format:

CycleInstruction RetiredReason
1  
2  
etc  

The instruction retired would either be one of the instructions from the test program or a "NOP" if dependencies necessitate any stall cycles. The reason column would give an explanation of why a stall was needed in that instance. Please include this information in a pdf file titled instruction_timeline.pdf.

Everything due before class.

What to submit

Electronic submission instructions

Submit a single demo2.tar file containing the following directories [tar -cvf demo2.tar verilog verification synthesis]. These sub-directories will already exist if you do your work in the demo2 directory from the original tar that was provided.

  1. verilog/ containing all verilog files. Please copy over ALL necessary files, your processor should be able compile and run with files from this directory alone.
  2. verification/mytests/ The assembly (.asm) files that you have written.
  3. verification/results/ Run all the categories and rename the summary.log files as shown below:
  4. verification/instruction_timeline.pdf - The timeline you have created for the retiring instructions of perf-test-dep-ldst.asm
    1. Simple tests: simple.summary.log
    2. Complex tests: complex.summary.log
    3. Random tests for demo1
      1. rand_simple: rand_simple.summary.log
      2. rand_complex: rand_complex.summary.log
      3. rand_ctrl: rand_ctrl.summary.log
      4. rand_mem: rand_mem.summary.log
    4. Random tests for demo2: complex_demo2.summary.log
    5. Your code results: mytests.summary.log
  5. synthesis/ the area and timing report

The log files MUST have the exact name. These are the log files produced by running wsrun.pl -list with the all.list file for each of those sets of benchmarks. You will have to rename summary.log manually into these names. If your handed in code does not follow this convention, it will not be accepted and you will receive a zero for this demo. If in doubt about what to submit, email the TA *before* the deadline and double-check.


The next few demos are minor changes to your processor and you should plan on doing them very quickly. They are optional. No print or electronic submissions required. April 24th is simply a suggested date. Make sure all demo2 tests pass at this phase.

7.  Demo #2.1 - Pipelined design with Aligned Memory (0% of project grade)

No Submission required. This is optional.

At this step, replace the original single-cycle memory with the Aligned single cycle memory. This is a very similar module, but it has an "err" output that is generated on unaligned memory accesses. Your processor should halt when an error occurs. Verify your design.

7.1  Aligned Single-Cycle Memory Specification

Before building your cache, you should use this memory to update and test your processor's interface to properly handle unaligned accesses. Many processors (e.g., MIPS) are byte addressable, but require that all accesses be aligned to their natural size (i.e., byte loads and stores can access any individual byte, but word loads and stores must access aligned words). Since your processor only has word loads and stores, this is pretty simple (to support byte stores, the memory would need byte write enable signals; to support byte loads, either the memory or the processor needs a mux to select the right byte). Notice that the memory always returns aligned data even on a misaligned load.

The verilog source (memory2c_align.v) and synthesizable version (memory2c_align.syn.v) were included in the project tar.

Since your single-cycle design must fetch instructions as well as read or write data in the same cycle, you will want to use two instances of this memory -- one for data, and one for instructions.

                      +-------------+ 
data_in[15:0] >-------| |--------> data_out[15:0] addr[15:0] >-------| 65536 word | enable >-------| by 16 bit |--------> err wr >-------| memory | clk >-------| | rst >-------| | createdump >-------| | +-------------+

During each cycle, the "enable" and "wr" inputs determine what function the memory will perform. On a unaligned access err is set.

enablewrFunctiondata_outerr
0XNo operation00
10ReadM[addr]0
11WriteWrite data_in0
1XXif (data[0]) set1

8.  Demo #2.2 - Pipelined design with Stalling Memory : 1 week after demo 2.0 (0% of project grade)

No Submission required. This is optional.

At this step, replace the single cycle memory with the Stalling memory. This is a very similar module, but has stall and done signals similar to the cache you built. Your pipeline will need to stall to handle these conditions. Verify your design.

  • Instruction memory: First replace your instruction memory module with this stalling memory, keep your data data memory module the same (i.e. aligned perfect memory from previous step). Verify your design. This will be easier to debug, as only module's behavior has changed.
  • Data memory: Now, replace your data memory module alone with this stalling memory, revert your instruction memory module back to the aligned perfect memory. Verify your design. This will be easier to debug, as only module's behavior has changed.
  • Instruction and Data memory: Now change both instruction and data memories to the stalling memory design. Verify your design.

8.1  Stalling Memory Specification

This module has an interface identical to the cache interface in mem_system_hier.v. With the same semantics.

Examining the source file stallmem.v, you will see "rand_pat", a shift register which controls the "ready" output. This is a random 32-bit number. You can changes its value by changing the seed used for random number of generation. You can do this by passing in "-seed" to wsrun.pl. For example:


wsrun.pl -seed 45 -prog foo.asm proc_hier_pbench *.v

If you are executing from inside ModelSim with run -All or using a testbench of your own for preliminary testing, you can pass in the seed, by adding the string "+seed=<value>" to the vsim command. Or simply edit stallmem.v and set the seed to a different value.


9.  Cache Demo - Working two-way set-associative cache (15% of project grade)

All information on the cache design and submissions instructions can be found on the cache design page. Note that an FSM diagram is due a week earlier than the demo.


10.  Demo #3 (final demo) - Pipelined Multi-cycle Memory with Optimizations (30% of project grade)

  • Due May 9th, 5pm.
  • Absolutely no extensions.
  • If you have more than 2 failures (not counting aligntest and extracredit failures) you will receive at least a 50% penalty.

At this final demo teams are expected to demonstrate the complete design to all specifications. This includes the following required items:

  • Two-way set-associative caches with multi-cycle memory
  • Register file bypassing
  • Bypassing from beginning of the MEM stage to beginning of EX stage
  • Bypassing from beginning of the WB stage to the beginning of the EX stage
  • Branches predicted non-taken
  • Halt instructions must leave the PC pointing to Halt+2. Do not let it increment past this address

Format will be similar to demo1.

What to submit:

Electronic submission instructions Submit a single demo3.tar file containing the following directories [tar -cvf demo3.tar verilog verification synthesis]. These sub-directories will already exist if you do your work in the demo3 directory from the original tar file that was provided.

  1. verilog/ containing all verilog files. Please copy over ALL necessary files, your processor should be able compile and run with files from this directory alone.
  2. verification/mytests/ The assembly (.asm) files that you have written.(atleast two tests)
  3. verification/results/ Run all test programs and rename the summary.log files as listed below:
    1. perf.summary.log
    2. complex_demofinal.summary.log
    3. rand_final.summary.log
    4. rand_ldst.summary.log
    5. rand_idcache.summary.log
    6. rand_icache.summary.log
    7. rand_dcache.summary.log
    8. complex_demo1.summary.log
    9. complex_demo2.summary.log
    10. rand_complex.summary.log
    11. rand_ctrl.summary.log
    12. inst_tests.summary.log
  4. synthesis/ the area and timing report (no reports or zero area => zero for this demo)
You can use the script run-final-all.sh to run all the required tests. It will create all these summary.log files.
Running all the tests will take about 40 minutes. So plan ahead!

I will electronically grade this submission, if you have more than 2 failures you will receive at least a 50% penalty. TAs will be holding office hours 1-5 on Wednesday the 11th for any teams that have failures and need to meet about partial credit. If this time does not work for a team please send us an email in advance to set up an alternative time.

No late submissions shall be graded. If we meet you for a demo, we will use files that you submitted at or before 5PM on the 9th of May.

  • If your design has known failures, then bring to the demo a written short explanation for as many failures as you can track down. This will exponentially increase the points you will get, compared to simply showing up and saying we don't know the reason for the failures.
  • If your entire design does not work, then you may show me a demo of a partially complete processor. So in your best interest, snapshot working parts of your design as you add more functionality. For example, you may show me any one of the following, if your full pipeline+cache does not work.
    • Stalling instruction memory alone
    • Stalling data memory alone
    • Stalling inst+data memory
    • Direct-mapped instruction memory alone
    • Direct-mapped data memory alone
    • Direct-mapped inst+data memory
    • 2-way instruction memory alone
    • 2-way data memory alone
    • 2-way inst+data memory

Both partners are required to be present and both are expected to explain and answer questions about the whole design. Answering a question with: "I have no idea, my partner did that" is a failing answer. You must (at least) be able to answer: "My partner implemented that, but it works in the following way....".


11.  Final Project Report: May 10th (5% of project grade)

Due by 1:00pm on May 10th

Each team should turn in one final report that is typed, well written, and well organized. Semantic, spelling, or grammatical errors will be penalized.

  • Please check the template for details on what is required.

For writing the final report use this template if you use Word, or follow the format in this pdf.

FinalReport.doc, FinalReport.pdf.


Page last modified on May 04, 2016, visited 3049 times

Edit - History - Print - Recent Changes (All) - Search