|  
      
  
	
 
	
Synthesis
Tutorial
1. 
Overview
In this class we will do some very simple synthesis of your
designs. The primary goal of this exercise is to get a sense for the
actual hardware your verilog is creating. Synthesizing your design
will allow us to see: 
 
	The actual gates 
	 
	Area 
	 
	Delay 
	 
	The actual longest path in your design 
	 
		 
 
There is a lot more to synthesis optimizations than what we will
cover in this class. 
 
We will be using Synopsys DC Compiler and a 45nm gate library
provided by FreePDF. A lot of the details will be abstracted away and
you will be using a simple script called synth.pl which
will do most of the work for you. 
 
To synthesize your design several pieces of information are
required: 
 
	A
	gate library that says what types of gates are available at your
	disposal. We will use the FREEPDK library installed at:
	/u/k/a/karu/courses/cs552/cad/Synopsys_Libraries/libs
	
	 
	List
	of verilog files and the name of each module. Now you will begin to
	appreciate why the verilog file name and the module name have to be
	identical 
	 
	The
	top-most module name 
	 
 
Whatever is in the red
box, you will *IGNORE* for this class and instead use the defaults we
provide. All your designs will be synthesized to meet a 1 Ghz clock
frequency (1ns clock-period). Area goal is to minimum area. 
 
We will perform synthesis in
the following three steps: 
 
	Write
	verilog, verify 
	 
	Check
	verilog is synthesizable 
	 
	Synthesize the verilog 
	 
 
Before we can begin, we
should setup environment variables and such just like we did for
ModelSim. 
 
2. 
Environment setup
If you have not already done yet, add the following line to
your .bashrc.local or .cshrc.local 
 
set PATH=$PATH:/p/course/cs552-david/public/html/S12/handouts/bins/
 
Add the following line to
your .cshrc.local 
 
source /s/synopsys/@sys/synopsys_env.csh 
Or add the following line to
your .bashrc.local 
 
source /s/synopsys/@sys/synopsys_env.sh 
Then, download the file
.synopsys_dc.setup
and copy it into your home directory. 
 
The file is called 
 
.synopsys_dc.setup 
 
Note
a dot is the first character in the filename. Many browsers may
sometimes delete this dot. So be careful. Your file copied in your
home directory MUST have the dot as the first character. 
 
	
	You MUST not
	modify this file 
	 
	
	You MUST not
	change its name 
	 
	
	You MUST
	copy the file in your home directory 
	 
 
IMPORTANT: Logout of
the shell, then log back in. This will make sure the modifications to
your .cshrc.local take effect
 
3. 
The synth.pl script
The synth.pl script is a
wrapper that will allow us to perform synthesis. It requires the
following information and has the following usage: 
 
3.1 
Usage
Usage: synth.pl [options]
    Options:
     [-cmd <check|synth>] What to do:
                              check = just check if everything is ok
                              synth = perform synthesis (will take longer)
     [-type <other|proc>] What is the design:
                              proc = This is the processor.
                                     Use when synthesizing the full processor
                                     then -f, -d, -e, -m, -wb must be specified
                              other = Some other design (use for hw, caches etc.)
     [-top <module name>] Name of the top-most module in your design. This
                          must be module instantiated inside the _hier level.
                          **You cannot specify the _hier module here*
     [-opt <yes|no> ]     Optimize the design yes or no. [Default = no]
     [-list <filename> ]  <filename> has a list of verilog files which make up
                          your design.
     [-file <f0,f1,f2,..> ] Provide a comma-separated list of verilog file names.
     Only one of -list or -file can be used
     [-f <fetch module]   Name of your fetch module,
                          required if type=proc, else ignored
     [-d <fetch module]   Name of your decode module,
                          required if type=proc, else ignored
     [-e <fetch module]   Name of your execute module,
                          required if type=proc, else ignored
     [-m <fetch module]   Name of your memory module,
                          required if type=proc, else ignored
     [-wb <fetch module]  Name of your write-back module,
                          required if type=proc, else ignored
3.2 
The output it creates are:
Output:
     If cmd=check
       synth/hiearchy.txt   Hieararchy of your design
       synth/<top>.syn.v    Gate-level version of your design
                            All modules will be in ONE single
                            verilog file. Replace top with the
                            top module name you specified
                            as input.
    If cmd=synth
       The above two files, PLU
     synth/report_reference.txt  Detailed usage of each module
     synth/area_report.txt       Detailed area report
     synth/timing_report.txt     Detailed timing report
3.3 
Some example usages:
Example usages:
Checking the ALU from hw2/problem2
prompt> synth.pl --list=foo --type=other --cmd=check --top=alu
Synthesizing the ALU from hw2/problem2
prompt> synth.pl --list=foo --type=other --cmd=synth --top=alu
Checking the full processor for demo2
prompt> synth.pl --list=foo --type=proc --cmd=check --top=proc -f=fetch -d=decode -e=execute -m=memory -wb=write_back
Assumes your fetch module is called fetch, decode module
is called decode etc. Since we are specifying module names (and NOT files names), there is no .v at the end of these names. 
4. 
Step-by-step tutorial
4.1 
Synthesizing alu from hw2/problem2
	
		| 
			 Step 1 
			 
		 | 
		
			 Setup environment 
			 
		 | 
	 
	
		| 
			 Step 2 
			 
		 | 
		
			 Go to the correct
			directory where you have all the verilog files. 
			 
			prompt> cd hw2/problem2 
		 | 
	 
	
		| 
			 Step 3 
			 
		 | 
		
			 Make a list of verilog
			files that are part of the design. Create a text file with one
			verilog filename per line. 
			 
			
				
				No
				testbenches should be included in this list 
				 
				
				No
				_hier files which are wrappers should be in this list 
				 
				
				clkrst.v
				should NOT be in the list, since we don't want to synthesize the
				clock generator 
				 
			 
			For
			example list.txt with the following contents. 
			 
			16_4mux.v
16_8mux.v
16CLA.v
24_12mux.v
28_14mux.v
32_16mux.v
32_8mux.v
4_1mux.v
4CLA.v
8_2mux.v
ALU.v
bshifter.v
bshift.v
logiALU.v
stage_blk.v 
		 | 
	 
	
		| 
			 Step 4 
			 
		 | 
		
			 a) First "check"
			the design 
			 
			prompt>synth.pl --list=list.txt --type=other --cmd=check --top=ALU 
			b) Look at the output on the
			screen, synth.log. See if any errors reports. If no errors go to
			step e. 
			 
			c) Fix errors; ask TA if
			you don't know meaning of error. See List of Common
			synthesis errors page. 
			 
			d) Goto to step a 
			 
			e) Checking done 
			 
		 | 
	 
	
		| 
			 Step 5 
			 
		 | 
		
			 Synthesis 
			 
			prompt>synth.pl --list=list.txt --type=other --cmd=synth --top=ALU 
			Wait for some time..... 
			 
			
			Go look in synth/*.txt. You
			will find an area report, cell report, and timing report. 
			 
		 | 
	 
	
		| 
			 Step 6 
			 
		 | 
		
			 Checking synthesis output
			
			 
			Make sure that no cells
			in the synth/cell_report.txt file has zeroes in it. If you see a
			zero area for a module name, then that means that module had some
			kind of Verilog coding error and did NOT get synthesized. You must
			fix this problem. Look in the synth.log files and search for that
			verilog file name in the file and look for warnings. 
			 
			Fix these warnings and
			try step 6 again. 
			 
		 | 
	 
 
4.2 
Synthesizing full processor
We will do this slightly
differently. For all other problems we allowed the synthesis tool to
completely "flatten" the design. If we flatten the full
processor, then reasoning about it and applying optimizations will be
hard. So we will break it up into some large pieces and preserve the
hierarchy at those levels. Specifically we will preserve fetch,
decode, execute, memory, and writeback modules of your processors.
Within each of those modules, we will let synthesis completely
flatten the design. This is why, when you specify the --type=proc
option, you must specify the fetch, decode, execute, memory, and
writeback module names. 
 
	
		| 
			 Step 1 
			 
		 | 
		
			 Setup environment 
			 
		 | 
	 
	
		| 
			 Step 2 
			 
		 | 
		
			 Go to the correct
			directory where you have all the verilog files. 
			 
			prompt> cd project 
		 | 
	 
	
		| 
			 Step 3 
			 
		 | 
		
			 Remember list of verilog
			files that are part of the design. Create a text file with one
			verilog filename per line. For example list.txt with the following
			contents. 
			 
			alu16bit.v
and_N.v
barrel_shifter_16b.v
branch_decision.v
carry_look_ahead.v
cla16bit.v
cla4bit.v
clkrst.v
control_path.v
data_path.v
decode3_8.v
decode_logic.v
Decode.v
dff_en.v
dff.v
Execute.v
proc.v 
		 | 
	 
	
		| 
			 Step 4 
			 
		 | 
		
			 a) First "check"
			the design 
			 
			prompt>synth.pl --list=list.txt --type=proc --cmd=check --top=proc --f=fetch --d=decode --e=execute --m=memory --wb=write_back 
			b) Look at the output on the
			screen, synth.log. See if any errors reports. If no errors go to
			step e. 
			 
			c) Fix errors; ask TA if
			you don't know meaning of an error. See List of Common
			synthesis errors 
			 
			d) Goto step a 
			 
			e) Checking done 
			 
		 | 
	 
	
		| 
			 Step 5 
			 
		 | 
		
			 Synthesis 
			 
			prompt>synth.pl --list=list.txt --type=proc --cmd=synth --top=proc --f=fetch --d=decode --e=execute --m=memory --wb=write_back 
			Wait for some time..... 
			 
			
			Go look in synth/*.txt. You
			will find an area report, timing report, proc.syn.v, cell_report,
			and a reference report 
			 
		 | 
	 
	
		| 
			 Step 6 
			 
		 | 
		
			 Checking synthesis output
			
			 
			Make sure that no cells
			in the synth/cell_report.txt file has zeroes in it. If you see a
			zero area for a module name, then that means that module had some
			kind of Verilog coding error and did NOT get synthesized. You must
			fix this problem. Look in the synth.log files and search for that
			verilog file name in the file and look for warnings. 
			 
			Fix these warnings and
			try step 6 again. 
			 
		 | 
	 
 
4.3 
Interpretting the output files (all in synth/)
hierarchy.txt
This file describes your
design hierarchy in text-format. It shows the list of top-level
modules. For each module it shows list of sub-modules. And for each
sub-module, the sub-sub-module and so on. An example is shown below: 
 
alu
    GTECH_AND2                                   gtech
    GTECH_NOT                                    gtech
    GTECH_OR2                                    gtech
    GTECH_XOR2                                   gtech
    barrelshifter
        bit1_shifter
            mux4_1
                mux2_1
                    GTECH_BUF                    gtech
                    GTECH_NOT                    gtech
        bit2_shifter
            mux4_1
                ...
        bit4_shifter
            mux4_1
                ...
        bit8_shifter
            mux4_1
synth.log
This is the log of all
synthesis commands. Specifically look in this file for warnings and
errors if your design does not synthesize. 
 
area_report.txt
This file includes a report
on the area occupied by your design. The file is mostly
self-explanatory. Th cell area is expressed in square microns. An
example file is shown below: 
 
Library(s) Used:
    gscl45nm (File: /scratch/users/karu/courses/cs755/tools/Synopsys_Libraries/libs/gscl45nm.db)
Number of ports:                3
Number of nets:               660
Number of cells:               15
Number of references:          12
Combinational area:       17600.626691
Noncombinational area:    2433.320446
Net Interconnect area:      undefined  (No wire load specified)
Total cell area:          20033.947137
Total area:                 undefined
Whatever you see on the line:
"Total cell area:" is the actual cell area. 
 
timing_report.txt
This file will contain the
list of the top-20 longest/slowest paths in your design. For each
such path you will see the start and a list of gates that make up the
path. Recall that, all your designs will be synthesized to meet a 1
Ghz clock frequency (1ns clock-period). For example: 
 
  Startpoint: dx_reg/dff0[106]/dff0/state_reg
              (rising edge-triggered flip-flop clocked by clk)
  Endpoint: xm_reg/dff0[62]/dff0/state_reg
            (rising edge-triggered flip-flop clocked by clk)
  Path Group: clk
  Path Type: max
  Point                                                   Incr       Path
  --------------------------------------------------------------------------
  clock clk (rise edge)                                   0.00       0.00
  clock network delay (ideal)                             0.00       0.00
  dx_reg/dff0[106]/dff0/state_reg/CLK (DFFPOSX1)          0.00 #     0.00 r
  dx_reg/dff0[106]/dff0/state_reg/Q (DFFPOSX1)            0.13       0.13 f
  dx_reg/dff0[106]/dff0/q (dff_264)                       0.00       0.13 f
  dx_reg/dff0[106]/q (dff_en_264)                         0.00       0.13 f
  dx_reg/Out<106> (register_N_N114)                       0.00       0.13 f
  ex_stage/reg_rs_dx<2> (Execute)                         0.00       0.13 f
  ex_stage/U225/Y (INVX1)                                 0.02       0.15 r
  ex_stage/U224/Y (NAND2X1)                               0.01       0.16 f
  ex_stage/U227/Y (AND2X2)                                0.04       0.20 f
  ex_stage/U228/Y (INVX1)                                 0.00       0.19 r
  ex_stage/U231/Y (AND2X2)                                0.03       0.22 r
  ex_stage/U242/Y (INVX1)                                 0.02       0.24 f
  ex_stage/U232/Y (NOR2X1)                                0.02       0.26 r
  ex_stage/forward/C47/Z_0 (*SELECT_OP_4.1_4.1_1)         0.00       0.26 r
  ex_stage/U223/Y (OR2X1)                                 0.03       0.29 r
  ex_stage/forward_a_mux/mux0[0]/mux2/C11/Z_0 (*SELECT_OP_2.1_2.1_1)
                                                          0.00       0.29 r
  ex_stage/U538/Y (INVX1)                                 0.01       0.30 f
  ex_stage/U537/Y (NAND2X1)                               0.01       0.31 r
  ex_stage/U448/Y (AND2X2)                                0.04       0.35 r
  ex_stage/U378/Y (XOR2X1)                                0.03       0.38 f
  ex_stage/U318/Y (INVX1)                                 0.00       0.39 r
  ex_stage/U434/Y (AND2X2)                                0.03       0.42 r
  ex_stage/U435/Y (INVX1)                                 0.02       0.43 f
  ex_stage/U584/Y (OAI21X1)                               0.05       0.49 r
  ex_stage/U601/Y (OAI21X1)                               0.03       0.51 f
  ex_stage/U390/Y (AND2X2)                                0.03       0.55 f
  ex_stage/U442/Y (INVX1)                                 0.00       0.54 r
  ex_stage/U417/Y (AND2X2)                                0.03       0.57 r
  ex_stage/U418/Y (INVX1)                                 0.01       0.59 f
  ex_stage/U276/Y (AND2X2)                                0.04       0.63 f
  ex_stage/U338/Y (XOR2X1)                                0.02       0.65 r
  ex_stage/alu/mux1/mux0[5]/mux0/C11/Z_0 (*SELECT_OP_2.1_2.1_1)
                                                          0.00       0.65 r
  ex_stage/alu/mux1/mux0[5]/mux2/C11/Z_0 (*SELECT_OP_2.1_2.1_1)
                                                          0.00       0.65 r
  ex_stage/alu/mux0/mux0[5]/C11/Z_0 (*SELECT_OP_2.1_2.1_1)
                                                          0.00       0.65 r
  ex_stage/alu/mux10/mux0[5]/C11/Z_0 (*SELECT_OP_2.1_2.1_1)
                                                          0.00       0.65 r
  ex_stage/U252/Y (OR2X2)                                 0.03       0.69 r
  ex_stage/U251/Y (INVX1)                                 0.01       0.70 f
  ex_stage/U250/Y (AND2X2)                                0.03       0.74 f
  ex_stage/U247/Y (AND2X2)                                0.03       0.77 f
  ex_stage/U246/Y (AND2X2)                                0.03       0.80 f
  ex_stage/U10/Y (AND2X2)                                 0.03       0.83 f
  ex_stage/U244/Y (AND2X2)                                0.03       0.87 f
  ex_stage/U257/Y (AND2X2)                                0.03       0.90 f
  ex_stage/U258/Y (AND2X2)                                0.03       0.93 f
  ex_stage/U261/Y (AND2X2)                                0.04       0.97 f
  ex_stage/U265/Y (INVX1)                                 0.00       0.96 r
  ex_stage/U262/Y (NAND2X1)                               0.01       0.97 f
  ex_stage/alu/mux7/C11/Z_0 (*SELECT_OP_2.1_2.1_1)        0.00       0.97 f
  ex_stage/alu/mux6/C11/Z_0 (*SELECT_OP_2.1_2.1_1)        0.00       0.97 f
  ex_stage/alu/mux5/mux0[0]/C11/Z_0 (*SELECT_OP_2.1_2.1_1)
                                                          0.00       0.97 f
  ex_stage/alu/mux4/mux0[0]/C11/Z_0 (*SELECT_OP_2.1_2.1_1)
                                                          0.00       0.97 f
  ex_stage/alu/mux3/mux0[0]/C11/Z_0 (*SELECT_OP_2.1_2.1_1)
                                                          0.00       0.97 f
  ex_stage/ALU_out<0> (Execute)                           0.00       0.97 f
  xm_reg/In<62> (register_N_N92)                          0.00       0.97 f
  xm_reg/dff0[62]/d (dff_en_128)                          0.00       0.97 f
  xm_reg/dff0[62]/U3/Y (INVX1)                            0.00       0.97 r
  xm_reg/dff0[62]/U2/Y (MUX2X1)                           0.02       0.99 f
  xm_reg/dff0[62]/dff0/d (dff_128)                        0.00       0.99 f
  xm_reg/dff0[62]/dff0/U3/Y (AND2X1)                      0.03       1.02 f
  xm_reg/dff0[62]/dff0/state_reg/D (DFFPOSX1)             0.00       1.02 f
  data arrival time                                                  1.02
  clock clk (rise edge)                                   1.00       1.00
  clock network delay (ideal)                             0.00       1.00
  xm_reg/dff0[62]/dff0/state_reg/CLK (DFFPOSX1)           0.00       1.00 r
  library setup time                                     -0.06       0.94
  data required time                                                 0.94
  --------------------------------------------------------------------------
  data required time                                                 0.94
  data arrival time                                                 -1.02
  --------------------------------------------------------------------------
  slack (VIOLATED)                                                  -0.08
In the above example, there are
about 40 or 50 gates on that path. Right at the end notice the string
slack (VIOLATED). This means the design is consuming 0.08ns longer
than it should. You should try optimizing. The names of gates and
their prefix give you a hint on which stage of the pipeline this
logic belongs to. 
 
references_report.txt
This file will show you all
the low-level modules that ended up in your design. It will show you
how many times each such cell was instantiated. For example: 
 
Reference          Library       Unit Area   Count    Total Area   Attributes
-----------------------------------------------------------------------------
AND2X1             gscl45nm       2.346500       5     11.732500
AND2X2             gscl45nm       2.815800      15     42.236999
BUFX2              gscl45nm       2.346500      15     35.197499
BUFX4              gscl45nm       2.815800       3      8.447400
INVX1              gscl45nm       1.407900      22     30.973799
INVX2              gscl45nm       1.877200       8     15.017600
INVX4              gscl45nm       3.285100       1      3.285100
INVX8              gscl45nm       3.285100       4     13.140400
NOR2X1             gscl45nm       2.346500       1      2.346500
OAI21X1            gscl45nm       2.815800       1      2.815800
OR2X1              gscl45nm       2.346500       1      2.346500
OR2X2              gscl45nm       2.815800      28     78.842399 
cell_report.txt
This file will provide the
individual areas of every module synthesized. If you see any module
with a zero in this file, it means that module was NOT synthesized
correctly. The format of this file is similar to the
references_report.txt file. 
 
.syn.v file
This file contains the
synthesized structural netlist of your design. 
 
4.4 
Optimizing your design - make it faster and smaller
Thus far we have been
synthesizing your design preserving its hierarchy. That is, if you
said build a barrel-shifft using mux -> shift -> mux ->
shift. Then synthesis will blindly create a hardware module for each
such individual module you specified. 
 
You can guide synthesis into
"flattening" your design, i.e. treat everything between two
flip-flops as raw combinational logic and simply create the most
efficient logic gates to implement this. When you do this process,
you will see your hierarchical design of the datapath completely
disappear. 
 
You can do this by adding
the --opt option to synth.pl. For example: 
 
prompt>synth.pl --list=list.txt --type=other --cmd=synth --top=ALU –opt yes
prompt>synth.pl --list=list.txt --type=proc --cmd=synth --top=proc --f=fetch --d=decode --e=execute --m=memory --wb=write_back –opt yes 
 
   
   |