March 2001
Copyright (c) 2001 by Shing Ip Kong. All Rights Reserved.
Filed at: http://www.cs.wisc.edu/~markhill/kong
Better-formatted revision at: http://www.cis.upenn.edu/~milom/elements-of-logic-design-style/
Appendix A: http://www.cs.wisc.edu/~markhill/kong/appendixA.html
Appendix B: http://www.cs.wisc.edu/~markhill/kong/appendixB.html
Appendix C: http://www.cs.wisc.edu/~markhill/kong/appendixC.html
$Source: /proj/gemini/cvs_root/P2002/Notes/Style/main_text,v $ $Date: 2001/12/06 21:49:07 $ $Revision: 1.1 $ $Id: main_text,v 1.1 2001/12/06 21:49:07 kong Exp $ A copy of this file is kept at: /home/kong/P2002/Notes/Style/main_text 1. Introduction ----------------------------------------------------------------------------- The goal of this document is to summarize some ideas I find useful in logic design and Verilog coding (Note 1). Logic design is not the same as Verilog coding. One common mistake of some inexperience logic designers is to treat logic design as a Verilog programming task. This often results in Verilog code that is hard to understand, hard to implement, and hard to debug. Logic design is a process: 1. Understand the problem. 2. If necessary, divide the problem into multiple modules with clean and well defined interfaces. 3. For each module: a. Design the datapath that can process the data for that module. b. Design the controller to control the datapath and produce control outputs (if any) to other adjacent modules. Verilog coding, on the other hand, is a modeling task. More specifically, after one has done some preliminary designs on the datapaths and controllers, Verilog code is then used to: 1. Model the datapaths and the controllers. 2. Connect the datapath and controller together to form modules. 3. Connect the modules together to form the final design. Note 1: Verilog is used as an example in this document. The ideas discussed in this document, however, should also applicable to other Hardware Description Language (such as VHDL) with minor adjustments. The rest of this document is organized as follows: Section 2 discusses the most important rule of logic design: keep it easy to understand. This section also introduces some basic Verilog coding guidelines. Section 3 discusses the art of dividing a design into high-level modules and then how these modules can be divided into datapaths and controllers. Section 4 discusses the logic design and Verilog coding guidelines for the datapath. Section 5 discusses the logic design and Verilog coding guidelines for the controller. Section 6 discusses some miscellaneous Verilog coding guidelines. Section 7 is a summary of all the logic design and Verilog coding guidelines introduced in this document. This summary serves as a quick reference for readers who either: (a) may not have the time to read this entire document, or (b) have already read this document once but want a quick reminder later on. Throughout this document, I have listed many Verilog files from my home directory (/home/kong/P2001/... ) as examples. They are listed here as references only. Readers do not need to read them to understand the key points of this document because I have already included the Verilog code I want to use as examples throughout this document. Furthermore, the example Verilog files that model a module, a datapath, and a controller are included in Appendix A, Appendix B, and Appendix C for those readers who are interested in looking at the structure of a complete Verilog file. 2. The Most Important Rule of Logic Design & Basic Verilog Coding Guidelines ----------------------------------------------------------------------------- The most important logic design rule is more a philosophy than a rule :-) *** Logic Design Guideline 2-1 (MOST IMPORTANT) *** The design MUST be as simple as possible and easy to understand! If a design is hard to understand, then nobody will be able to help the original designer with his or her work. Also as time passes, the hard to understand design will become impossible to maintain and debug even for the original designer. Therefore, a logic designer must keep his or her design simple and easy to understand even if that means the design is slightly bigger or slightly slower as long as the design is still small enough and fast enough to meet the specification. One important step in keeping a design simple and the Verilog code that models the design easy to understand is to use standard logic elements such as register, multiplexer, decoder, ... etc. Consequently, the first step in any Verilog coding project is: *** Verilog Coding Guideline 2-1 *** Model all the standard logic elements in a library file to be SHARED by ALL engineers in the design team. For an example of such a library, see: /home/kong/P2001/Verilog/CommonFiles/sata_library.v *** For those readers who do not have access to my home directory, *** *** don't worry. I will include the important Verilog code I want *** *** to use as examples throughout this document. Furthermore, in *** *** Appendix A, Appendix B, and Appendix C are examples of Verilog *** *** files that model a module, a datapath and a controller. *** Below are some examples of the basic logic elements defined in sata_library.v /*************************************************************** * Simple N-bit register with a 1 time-unit clock-to-q time ***************************************************************/ module v_reg( q, c, d ); parameter n = 1; output [n-1:0] q; input [n-1:0] d; input c; reg [n-1:0] state; assign #(1) q = state; always @(posedge c) begin state = d; end endmodule // v_reg /*************************************************************** * Simple N-bit latch with a 1 time-unit clock-to-q time ***************************************************************/ module v_latch ( q, c, d ); parameter n = 1; output [n-1:0] q; input [n-1:0] d; input c; reg [n-1:0] state; assign #(1) q = state; always @(c or d) begin if (c) begin state = d; end end endmodule // v_latch /*************************************************************** * Simple N-bit 2-to-1 Multiplexer ***************************************************************/ module v_mux2e( z, s, a, b ); parameter n = 1; output [n-1:0] z; input [n-1:0] a, b; input s; assign z = s ? b : a ; // s=1, z<-b; s=0, z<-a endmodule One key observation from the logic elements defined in sata_library.v is: *** Verilog Coding Guideline 2-2 *** Only the storage elements (examples: register and latch) have non-zero clock-to-q time. All combinational logic (example: mux) has zero delay. The non-zero clock-to-q time of the storage elements will prevent hold time problems at all registers' inputs. In general, a logic designer must NOT rely on a combinational logic block to have a certain minimum delay. The zero delay in the verilog model of the combinational logic elements will ensure logic designer does not rely on any minimum delay during simulation. Once the basic logic elements have been modeled in the library file: *** Verilog Coding Guideline 2-3 *** Use explicit register and latch (example: v_reg and v_latch as defined in the examples above ) in your verilog coding. Do not rely on logic synthesis tools to generate latches or registers for you. By making the logic designer explicitly place the registers and/or latches, the logic designer is forced to consider timing implication of their logic early in the design cycle. In other words, the designer is forced to ask himself or herself questions such as: am I having too much logic in between registers so that it may not meet the cycle time? Also with explicit registers and latches in the the Verilog code, it will be much easier for those who read the code to draw a simple block diagrams showing all the registers in the design. Such a block diagram (see Section 4 and Section 5) is very useful in terms of understanding the design (remember the MOST important Logic Design Guideline above: the design must be easy to understand) as well as making timing tradeoffs when such tradeoffs are necessary. At first glance, it seems ironic that the logic designer needs to always keep in mind how much combinational logic exists between any two storage elements (registers or latches) while in Verilog coding (see Verilog Coding Guideline 2-2), we want to treat all combinational logic to have zero delay. The reason for this apparent contradiction is that in logic design, the delay of the combinational logic between storage elements determines the cycle time. Consequently, it is important for the logic designer to be aware of the complexity of the logic between two storage components at all time. On the other hand, in order to reduce potential hold time problems, we also do not want the correct operation of the logic to depend on the logic having a certain minimum delay. The best way to make sure the logic can operate correctly without relying on the combinational logic blocks to have certain minimum delay is to run the Verilog simulation with all combinational logic blocks having zero delay and rely on the storage elements' (registers and/or latches) non-zero clock-to-q time to satisfy the hold time requirement of the next register. 3. Hierarchal Design and Clock Domain Consideration ----------------------------------------------------------------------------- Another important step in keeping a design simple and the Verilog code that models the design easy to understand is to adopt a hierarchal approach to the design process and then make the Verilog code follows the same hierarchy. Hierarchal design, however, should not be carry to an extreme. For example, as pointed out by one of my colleagues Kyutaeg Oh [1], too deep an hierarchy can cause too many module instantiations, which will cause synthesis to run too slowly. Below is an hierarchal design strategy I find useful. *** Logic Design Guideline 3-1 *** Use an hierarchal strategy that breaks the design into modules that consists of datapaths and controllers. More specifically: 1. Divide the problem into multiple modules with clean and well defined interface. 2. For each module: a. Design the datapath that can process the data for that module. b. Design the controller to control the datapath and produce control outputs (if any) to other adjacent modules. One example for such an hierarchal approach can be found in the Serial ATA to Parallel ATA Converter for the Disk (Device Dongle). And as shown in Figure 3-1, the Device Dongle are divided into three modules: 1. The Parallel ATA Interface to the disk: ATAIF. See Reference [2]. 2. The Transport Layer: Transport. See Reference [3]. 3. The Link Layer: Link. See Reference [4]. +-----------+ +-------------+ +--------+ | | | | | | /-------\| Parallel |---> Transport |---> Link +--> To Serializer < ATA Bus > ATA | | Layer | | Layer | \-------/| Interface |<--+ |<--+ | | (ATAIF) | | (Transport) | | (Link) <--- From Deserializer | | | | | | +-----------+ +-------------+ +--------+ Figure 3-1: The Three Modules that Form the Device Dongle The Parallel ATA Interace (ATAIF), the Transport Layer (Transport) and the Link Layer (Link) shown in Figure 3-1 are further divided into datapath and controller modules as described below and shown in Figure 3-2. +----------------------+ +----------------------+ | Transport Layer | | Link Layer | | dtrans | | link | | +------------------+ | | +------------------+ | | | Transmit Engine | | | | Transmit Engine | | | | dtrans_tx | | | | link_tx | | | | +--------------+ | | | | +--------------+ | | | | | Datapath | | | | | | Datapath | | | | | | dtrans_txdp | | | | | | link_txdp | | | +------------------+ | | +--------------+ | | | | +--------------+ | | | Parallel ATA | | | | | | | | | | Interface | | | +--------------+ | | | | +--------------+ | | | dataif | | | | Controller | | | | | | Controller | | | | +-----------+ | | | | dtrans_txctl | | | | | | link_txctl | | | | | Datapath | | | | +--------------+ | | | | +--------------+ | | | | | | | | | | | | | | | | dataif_dp | | | | +--------------+ | | | | +--------------+ | | | +-----------+ | | | | Synchronizer | | | | | | Synchronizer | | | | | | | | dtrans_txsyn | | | | | | link_txsyn | | | | | | | +------------^-+ | | | | +------------^-+ | | | +------------+ | | +---+----------|---+ | | +---+----------|---+ | | | Controller | | | |(3) |(3) | | |(1) |(2) | | | | | | +---|----------+---+ | | +---|----------+---+ | | | dataif_ctl | | | | +-v------------+ | | | | +-v------------+ | | | +------------+ |(4)| | | Synchronizer | | | | | | Synchronizer | | | | +-------> | | | | | | | | | | | | | | dtrans_rxsyn | | | | | | link_rxsyn | | | | +--------------+ | | | +--------------+ | | | | +--------------+ | | | | Synchronizer | |(5)| | +--------------+ | | | | +--------------+ | | | | <-----+ | | Controller | | | | | | Controller | | | | | dataif_syn | | | | | dtrans_rxctl | | | | | | link_rxctl | | | | +--------------+ | | | +--------------+ | | | | +--------------+ | | +------------------+ | | | | | | | | | | +--------------+ | | | | +--------------+ | | | | | Datapath | | | | | | Datapath | | | | | | dtrans_rxdp | | | | | | link_rxdp | | | | | +--------------+ | | | | +--------------+ | | | | Receive Engine | | | | Receive Engine | | | | dtrans_rx | | | | dtrans_rx | | | +------------------+ | | +------------------+ | +----------------------+ +----------------------+ Figure 3-2: Further Divisions of the Device Dongle Modules The Parallel ATA Interface, modeled by the "module dataif" in the Verilog file dataif.v (see Reference [2]), is further divided into the followings (see Reference [5]): Datapath: module dataif_dp in the Verilog file dataif_dp.v Controller: module dataif_ctl in the Verilog file dataif_ctl.v Synchronizer: module dataif_syn in the Verilog file dataif.v The Transport Layer, modeled by the "module dtrans" in the Verilog file dtrans.v (see Reference [3]), is further divided into the followings (see Reference [6]): Transmit Engine: module dtrans_tx in the Verilog file dtrans_tx.v This Transport Layer Transmit Engine is further divided into: Datapath: module dtrans_txdp in the Verilog file dtrans_txdp.v Controller: module dtrans_txctl in the Verilog file dtrans_txctl.v Synchronizer: module dtrans_txsyn in the Verilog file dtrans_tx.v Receive Engine: module dtrans_rx in the Verilog file dtrans_rx.v This Transport Layer Receive Engine is further divided into: Datapath: module dtrans_rxdp in the Verilog file dtrans_rxdp.v Controller: module dtrans_rxctl in the Verilog file dtrans_rxctl.v Synchronizer: module dtrans_rxsyn in the Verilog file dtrans_rx.v Similarly the Link Layer, modeled by the "module link" in the Verilog file link.v (see Reference [4]), is further divided into the followings (see Reference [7]): Transmit Engine: module link_tx in the Verilog file link_tx.v This Link Layer Transmit Engine is further divided into: Datapath: module link_txdp in the Verilog file link_txdp.v Controller: module link_txctl in the Verilog file link_txctl.v Synchronizer: module link_txsyn in the Verilog file link_tx.v Receive Engine: module link_rx in the Verilog file link_rx.v This Link Layer Receive Engine is further divided into: Datapath: module link_rxdp in the Verilog file link_rxdp.v Controller: module link_rxctl in the Verilog file link_rxctl.v Synchronizer: module link_rxsyn in the Verilog file link_rxsyn.v For those readers who have accessed to my home directory and are interested in taking a closer look at the Verilog files discussed above, please refer to References [2 to 7]. However, the detail contents of these Verilog files are not needed to illustrate the following Verilog Coding Guideline: *** Verilog Coding Guideline 3-1 *** A separate Verilog file is assigned to the Verilog code for: 1. Each datapath. Example: dtrans_txdp.v 2. Each controller. Example: dtrans_txctl.v 3. As well as the Verilog code for each high level module, that is a module at a hierarchy level higher than the datapath and the controller. Examples: link_tx.v, link_rx.v, and link.v A corollary of the above Verilog Coding Guideline is as follows: *** Verilog Coding Guideline 3-2 *** In order to keep the number of Verilog files under control, one should try not to assign a separate Verilog file to any low level module that is at a hierarchy level lower than the datapath and the controller. For example as I will show you in Section 4, the datapath will contain many datapath elements. Instead of assigning a separate Verilog file for each of these datapath elements, the datapath elements are all grouped into a single "library" file (link_library.v). Similarly, as I will show you in Section 5, the controller will contain a "Next State Logic" and an "Output Logic" blocks. Instead of assigning a separate Verilog file for each logic block, the logic blocks will be included in the Verilog file assigned to the controller. Enclosed in Appendix A are the Verilog files dtrans.v, dtrans_tx.v, and dtrans_rx.v. Here is something worth noticing: *** Verilog Coding Guideline 3-3 *** The Verilog code for the high level module, that is module at a hierarchy level higher than the datapath and the controller (examples: module dtrans_tx, module dtrans_rx, and module dtrans) should not contains any logic. It should only shows how the lower level modules are connected. For example, if you look at the dtrans.v file in Appendix A, the "module dtrans" only shows how its transmit engine (dtrans_tx) and its receive engine (dtrans_rx) are connected. Similarly, if you look at the dtrans_tx.v file in Appendix A, the "module dtrans_tx" contains only the information on how its datapath (dtrans_txdp), its controller (dtrans_txctl), and its synchronizer (dtrans_txsyn) are connected together. In any case, neither the "module dtrans," the "module dtrans_tx," nor the "module dtrans_rx" contain any Verilog code that models raw logic. Notice from Figure 3-2 that the ATA Interface module is divided into the datapath and the controller. On the other hand, the Transport Layer and the Link Layer are first partitioned into the Transmit Engine and the Receive Engine before further divided into controller and datapath. The reason for this extra level of hierarchy for the Transport Layer and the Link Layer is because their Transmit Engines and their Receive Engines work in different clock domains. More specifically, the ATA Interface, the Transmit Engine of the Link Layer, and the Transmit Engine of the Transport Layer all operates under the same clock, the transmit clock while the Receive Engines of the Link and Transport Layers both operates on a different clock, the receive clock. This leads to the following design guidelines: *** Logic Design Guideline 3-2 *** Keep different clock domains separate and have an explicit synchronization module for signals that cross the clock domain. For example, please refer to the places in Figure 3-2 labeled with numbers in parentheses as you read the numbered paragraph below: 1. All signals going from the Link Layer's Transmit Engine to its Receive Engine must go through synchronization via the module "link_rxsyn" before the signals can be used by the Receive Engine. 2. Similarly, all signals going from the Link Layer's Receive Engine to its Transmit Engine must go through synchronization via the module "link_txsyn" before the signals can be used by the Transmit Engine. 3. The discussion in Paragraph 1 and Paragraph 2 above also applies to the signals between the Transmit Engine and the Receive Engine of the Transport Layer. 4. Since the Parallel ATA Interface and the Receive Engine of the Transport Layer operate on different clock domain, all signals going from the Parallel ATA Interface to the Transport Layer's Receive Engine must go through synchronization via the module "dtrans_rxsyn" before the signals can be used by that Receive Engine. 5. Similarly, all signals going from the Transport Layer's Receive Engine to the Parallel ATA Interface must go through the synchronization module "dataif_syn" before the signals can be used by the ATA Interface. 4. Datapath Design ----------------------------------------------------------------------------- Figure 4-1 is an example of a generalized datapath and the next paragraph describes some important observations from this figure. |<--- Control Inputs from the Controller (2) -->| | | | | | | | |...| (3a) | | | ... | | +-v---v--+ |Select | +-v--------v-+ | Input N | See | N | | | Simple (4) | | A ---/--> Figure +-/-+ + | (5) | |Random Logic| | (1) | 4-2 | | |\v (3d) | +-+--------+-+ | +-+---+--+ | | \ +---+ |...| |...| (3b) |...| +-->0 + | | +-v---v--+ +-v---v-+-+ v v | | N | R | N | See | N | See | | N Output | +-/-> E +-/-> Figure +-/->Figure | +--/--> Q |...| (3a) | | ^ | G | ^ | 4-2 | | 4-3 | | (1) +-v---v--+ +-->1 + | | | | +-+---+--+ +-+---+-+^+ Input N | See | N | | / | +-^-+ | |...| |...| | (5) B ---/--> Figure +-/-+ |/ | | | v v v v CLK (1) | 4-2 | + | CLK | | +-+---+--+ (3c) | | (1) | |...| K (1) Y Internal Signals | v v | |<--- Control Outputs to the Controller (2) --->| Figure 4-1: Block Diagram of the General Datapath When you read the numbered paragraphs below, please refer to the places in Figure 4-1 labeled with the same numbers in parentheses: 1. This simple N-bit datapath has two N-bit data inputs (A and B) and one N-bit data output Q. The internal signals K and Y are marked here to facilitate the discussion of the pipeline register in Paragraph 5 below. 2. Other than the N-bit data inputs and outputs discussed in Item 1, a generalized datapath should also have Control Inputs from the controller and Control Outputs to the controller (see Section 5). 3. In general, a datapath consists of the following components: a. Combinational Datapath Elements shown in Figure 4-2 where the N-bit Data Output and the Control Outputs depend ONLY on the current values of the N-bit Data Input and Control Inputs. Examples of Combinational Datapath Elements are the multiplexer and the ALU. b. Sequential Datapath Elements shown in Figure 4-3 where the N-bit Data Output can depend on the current N-bit Data Input, the current Control Input, as well as the previous cycle's N-bit Data output. An 8-bit counter is an example of a Sequential Datapath Element. c. Multiplexers, which is just a special case of the Combinational Datapath Elements shown in Figure 4-2. d. Registers or Register File, which can be consider as a special case of the Sequential Datapath Elements shown in Figure 4-3. 4. The "Simple Random Logic" here are commonly referred to as "glue logic" which consists of simple inverters, AND gates, and OR gates. In theory, all these "glue logic" can be integrated into the controller that is is discussed in Section 5. In practice, however, it is sometimes simplier to just use some "glue" logic in the datapath. 5. The register described in Item 3d as well as the implicit register at the output of the Sequential Datapath Element (Item 3b) are commonly referred to as the pipeline register. Control Inputs | |...| | | | | | +---v-v---v-v---+ | | N | Combinational | N -----/------> Datapath +------/-----> N-bit | Elements | N-bit Data Input | | Data Output +---+-+---+-+---+ | |...| | | | | | v v v v Control Outputs Figure 4-2: A Combinational Datapath Elements Control Inputs | |...| | | | | | +---------------------+ | | | | | | | +-v-v---v-v---+---+ | | | | | | +-> | +-+ N | Sequential | R | N -----/------> Datapath | E +------/-----> N-bit | Elements | G | N-bit Data Input | +-----+ | | Data Output | | REG < | | +-+-+-+-+-+-+-+-^-+ | |...| | | | | | | CLK v v v v Control Outputs Figure 4-3: A Sequential Datapath Elements with Register at its Outputs The main function of the explicit pipeline register shown in Figure 4-1's Item 3d and Item 5 is to limit the datapath's critical path delay to a value less than the desired cycle time of the system. The effect of such pipeline register can be best understood with a timing diagram. *** Logic Design Guideline 4-1 *** The best way to study the effect of the datapath's pipeline registers is to draw a timing diagram showing each register's effect on its outputs with respect to rising or falling edge of the register's input clock. Figure 4-4 below is an example of such a timing diagram for the generalized datapath example shown in Figure 4-1. In this timing diagram example (when you read the numbered paragraphs below, please refer to the places in Figure 4-4 labeled with the same numbers in parentheses): 1. The N-bit Input A and Input B settle to their known values "A" and "B" sometimes after the rising edge of Cycle 2. For the sake of simplicity, let's assume all the Control Inputs (likely generated by a controller similar to the one described in Section 5) of this datapath are stable prior to the rising edge of the Cycle 2 so that they are not factors in the critical delay path considerations. In actual design, such assumptions will be verified by static timing analysis. 2. Due to the assumption of the Control Inputs listed in Item 1, we only need to make sure Input A and Input B settle early enough to allow the two Combinational Datapath Elements (Item 3a in Figure 5-1) and the multiplexer (Item 3c in Figure 5-1) to produce the Internal Signal K at least one set-up time prior to the rising edge of Cycle 3. 3. If the condition listed in Item 2 is met, the pipeline register can then capture the value of Internal Signal K and set the Internal Signal Y to the value "Y" one clock-to-q time after the rising edge of Cycle 3. 4. Once again due to the assumption of the Control Inputs listed in Item 1, then as long as the Combinational Datapath Element after the pipeline register (Item 3d in Figure 4-1) together with the combinational logic within the Sequential Datapath Element (Item 3b in Figure 4-1) can produce the result for the Sequential Datapath Element's "implicit" register at least one set-up time prior to the rising edge of Cycle 4, then the Output of this datapath will be set to the stable value "Q" one clock-to-q time after the rising edge of Cycle 4. | 1 | 2 | 3 | 4 | 5 | 6 | | | | | | | | +----+ +----+ +----+ +----+ +----+ +----+ +- Clock | | | | | | | | | | | | | ----------+ +----+ +----+ +----+ +----+ +----+ +----+ | | | | | | | ---------------------+ +-------+ +-------------------------------------- Input A ///////////// X A X ////////////////////////////////////// ---------------------+ +-------+ +-------------------------------------- | | (1) | | | | | ---------------------+ +-------+ +-------------------------------------- Input B ///////////// X B X ////////////////////////////////////// ---------------------+ +-------+ +-------------------------------------- | | (2) | | | | | ---------------------------+ +-------+ +-------------------------------- Internal Signal K ///////// X Y X //////////////////////////////// ---------------------------+ +-------+ +-------------------------------- | | | (3) | | | | -------------------------------+ +-------+ +---------------------------- Internal Signal Y ///////////// X Y X //////////////////////////// -------------------------------+ +-------+ +---------------------------- | | | | (4) | | | -----------------------------------------+ +-------+ +------------------ Output Q //////////////////////////////// X Q X ////////////////// -----------------------------------------+ +-------+ +------------------ Figure 4-4: A Timing Diagram of the Datapath's Pipeline Register Item 4 above brings up an interesting observation of the Sequential Datapath Element shown in Figure 4-3 where the implicit register of this datapath element is shown to be on the output side of the element. The placement of the register on the output side (versus the input side) in the drawing is intentional. It reflects the actual placement of the register in hardware. I like to place such a register at the output (versus input) so that all N-bit of the output will be stable at the same time at one clock-to-q time after each rising edge of the clock. Also shown in Figure 4-3 is that some Control Outputs of the Sequential Datapath Element can also be registered. This, however, is not as common as having the Control Outputs to be strictly combinational and allows the user of these signals (likely to be the controller, see Section 5) the flexibility of using these values one cycle earlier if the critical timing is not violated. The above discussion of the timing diagram in Figure 4-4 illustrates that the logic designer cannot draw an accurate timing diagram unless he or she knows the exact location of the registers relative to the combinational logic. This brings us a corollary of the Logic Design Guideline 4-1: *** Logic Design Guideline 4-2 *** The block diagram of the datapath should show ALL registers, including the implicit register of the Sequential Datapath Element. Enclosed in Appendix B is the example Verilog file link_txdp.v which models the datapath for the Link Layer Transmit Engine (see Reference [8]). Let's take a look at some interesting observations from link_txdp.v: *** Verilog Coding Guideline 4-1 *** Keep the verilog coding of the datapath simple and straight forward. Leave the fancy coding (IF any) to the datapath elements and place such elements in a separate (library) file. For example, the Verilog coding of link_txdp.v is simplified by using the following two Sequential Datapath Elements: /* * Scrambler */ l_scramble scrambler ( .scr_out (scr_out), .scr_in (32'hc2d2768d), .scr_init (txscr_init), .scr_run (txscr_run), .clk (txclk4x), .reset (lktx_reset)); /* * CRC Calculator */ l_crccal crc_calculator ( .crc_out (crc_out), .crc_in (32'h52325032), .datain (tp_txdata), .crc_init (txcrc_init), .crc_cal (txcrc_cal), .clk (txclk4x), .reset (lktx_reset)); As well a Combinational Datapath Element: /* * Generate the primitive (prime_out) based on the selection (sel_prim) */ l_primgen primgen (.prim_out (prim_out), .sel_prim (sel_prim)); More specifically, the Verilog code in link_txdp.v only shows what the logic designer cares about the most at the datapath level: how the datapath elements (register, multiplexers, counters ... etc.) are connected together. The detailed modeling of these datapath elements are done in link_library.v which contains all library elements for the Link Layer. For your reference, link_library.v is also attached in Appendix B (see Reference [9]). Below are a few lines from link_library.v that defines the Scrambler. /******************************************************************** * l_scramble: 32-bit scrambler that can be: * a. Reset to all zeros asynchronously * b. Load a fix pattern synchronously. * c. Keep its old value if scramble is not enable. * d. Update its output synchronously based on a LFSR algorithm. ********************************************************************/ module l_scramble (scr_out, scr_in, scr_init, scr_run, clk, reset); output [31:0] scr_out; // Scrambler's output input [31:0] scr_in; // Initial pattern to be loaded input scr_init; // Load the initial pattern input scr_run; // Update scr_out based on a LFSR input clk; input reset; reg [31:0] scram; // Scramble data pattern reg a15, a14, a13, // Intermediate scramble bits a12, a11, a10, a9, a8, a7, a6, a5, a4, a3, a2, a1, a0; wire [31:0] runmuxout; // Output of the scr_run MUX wire [31:0] lastmux; // Output of the final MUX /* * Combinational logic to produce the scramble pattern, * which should be updated whenever scr_out changes. * This logic was copied from Frank Lee's scramble.v */ always @(scr_out) begin a15 = scr_out[31] ^ scr_out[29] ^ scr_out[20] ^ scr_out[16]; a14 = scr_out[30] ^ scr_out[21] ^ scr_out[17]; a13 = scr_out[31] ^ scr_out[22] ^ scr_out[18]; : : scram[2] = a15^a14^a13; scram[1] = a15^a14; scram[0] = a15; end // Scrambling logic /* Priority: * scram scr_out ------------------------------- * | | reset (asynchronous): highest * +---v---v---+ scr_init (synchronous): middle * scr_run-->\S 1 0 / scr_run (synchronous): lowest * +---+---+ scr_in * | | * +---v-------v---+ * \ 0 1 S/<--scr_init (higher priority than scr_run) * +-----+-----+ * | * v * lastmux */ v_mux2e #(32) run_mux (runmuxout, scr_run, scr_out, scram); v_mux2e #(32) init_mux (lastmux, scr_init, runmuxout, scr_in); v_regre #(32) scr_ff (scr_out, clk, lastmux, (scr_run | scr_init), reset); endmodule // l_scramble The definition of the Scrambler l_scramble (the "l_" pre-fix indicates this is defined in link_library.v) illustrates another Logic Design Guideline: *** Logic Design Guideline 4-3 *** While designing the Sequential Datapath Elements, separates the element into the two parts: (1) the combinational logic, and (2) the register. For example in l_scramble.v, the combinational logic of the Scrambler is modeled by the "always" statement: always @(scr_out) begin a15 = scr_out[31] ^ scr_out[29] ^ scr_out[20] ^ scr_out[16]; : : scram[2] = a15^a14^a13; end // Scrambling logic while the register is modeled the 32-bit wide "v_regre" defined in library shared by the entire design team (see Verilog Coding Guideline 4-2 below): v_regre #(32) scr_ff (scr_out, clk, lastmux, (scr_run | scr_init), reset); The use of "v_regre" (the pre-fix "v_" indicates this element is defined in the common library) illustrates the following Verilog Coding Guideline: *** Verilog Coding Guideline 4-2 *** The Verilog coding of the datapath elements should make use of the standard logic elements (registers, multiplexers, ... etc.) already defined in the library discussed in Verilog Coding Guideline 2-1. The last file included in Appendix B is "link_defs.v" (see Reference [10]) which defines all the "symbolic values" (i.e. assign a symbolic name to a given constant value) to be used by all the Verilog files for the Link layer. For example, this following line: `include "link_defs.v" is used in both the datapath file (link_txdp.v) and the Link Layer library file (link_library.v) so that all the symbolic values defined in link_defs.v. can be used by these two files. Below are some examples of these symbolic values that are specific to the datapath: /* * Number of primitives and the bit position of the 1-hot encoded vector */ `define num_prim 18 // Basic Primitives `define B_ALIGN 0 `define B_SYNC 1 `define B_CONT 2 : : `define B_X_RDY 9 : : `define B_PMACK 16 `define B_PMNAK 17 These symbolic values are then used by datapath file (link_txdp.v) in the following way: /* * Interconnections within this portion of the datapath */ wire [`num_prim:0] // Number of primitives + D10.2 sel_prim; // Select the proper primitives // Primitive send by the Transmit Controller assign sel_prim[`B_ALIGN] = txsn_align; assign sel_prim[`B_X_RDY] = txsn_xrdy; It should be obvious that the Verilog code above is much easier to maintain and much easier to understand than the equivalent Verilog code: /* * Interconnections within this portion of the datapath */ wire [18:0] sel_prim; // Primitive send by the Transmit Controller assign sel_prim[0] = txsn_align; assign sel_prim[9] = txsn_xrdy; This example of how Verilog code uses symbolic values to improve its ease of maintenance leads us to the following Verilog Coding Guideline: *** Verilog Coding Guideline 4-3 *** Define symbolic values (see also Verilog Coding Guideline 5-2) in a header file (example: link_defs.v) and include this header file in all files that can make use of these symbolic values to make the Verilog code easier to maintain and easier to understand. Other symbolic values defined in link_defs.v such as: // Number of TX states and bit position of the 1-hot state encoding `define num_lktxstate 15 `define B_NOCOMM 0 `define B_SENDALIGN 1 `define B_NOCOMMERR 2 : : : `define B_BUSYRCV 13 `define B_POWERDOWN 14 // State Values `define RESET 15'h0000 // All bits are zeros `define NOCOMM 15'h0001 // Bit 0 is set `define SENDALIGN 15'h0002 // Bit 1 is set : : : `define POWERSAVE 15'h4000 // Link layer is power down are used for the Verilog code that models the controller for the Link Layer. How these symbolic values can be used to simplify the Verilog code of the controller will be explained in Section 5. More specifically, please refer to Verilog Coding Guideline 5-1 in Section 5. 5. Controller Design ----------------------------------------------------------------------------- Almost without exception, within the core of every controller is one or more finite state machine(s). This is shown in Figure 5-1 where only one finite state machine is shown for simplicity. Reader with enough imagination should be able to visualize how this picture can be generalized with multiple finite state machines. +------------------------------------------------+ | A General Controller | | +--------------+---+---+ | (2a) Inputs | (1) | Finite State |S | | | Outputs -----------+---------> Machine |T R| +-+--------------------> | | +---+ | (4) |A E| | | +---+ | Type 1 | | | R | | |T G| | | | R | | | +-> E +-+->See Figure 5-2|E | | +-> E +-+------------> | | | G | | | or Figure 5-3| | | | | G | | | Outputs | | +-^-+ | +--------------+-^-+---+ | +-^-+ | | Type 2 | | | | | | | | | (2b) | | clk | clk | clk | | | | | | | | | | | +-v-------v-+ | | | | | | | | | +------------------------> Simple | | Outputs | | | Random +----------> | +--------------------------------> Logic (3) | | Type 3 | | | | (2c) | +-----------+ | +------------------------------------------------+ Figure 5-1: Block Diagram of the General Controller Here are some important observations from Figure 5-1. When you read the numbered paragraphs below, please refer to the places in Figure 5-1 labeled with the same numbers in parentheses: 1. The inputs to the controller are divided into two groups. The first group is used as inputs to the finite state machine directly while the second group is "staged" by one or more stage(s) of pipeline registers before being used as inputs by the finite state machine. 2. As far as the outputs of the controller are concerned, they can be classified into three types: a. Outputs that come directly from the finite state machine's outputs. b. Outputs of the finite state machine after they have been staged by one or more stage(s) of pipeline register. c. Outputs of some random logic (see also Paragraph (3) below) whose inputs can either be any of the signals described in Paragraph (1), Paragraph (2a), or Paragraph (2b) above. 3. The "Simple Random Logic" here are commonly referred to as "glue logic" which consists of simple inverters, AND gates, and OR gates. In theory, all these "glue logic" can be integrated into the finite state machine shown in either Figure 5-2 or Figure 5-3. In practice, however, it is sometimes simplier to just use some "glue" logic. 4. In general, there are two types of finite state machines: a. The simple Moore Machine shown in Figure 5-2 whose outputs depend ONLY on the current state. b. The more complex Meally Machine shown in Figure 5-3 whose outputs depend on BOTH the current state as well as the inputs. +---------------------------+ | +-------+ +---+ | +--------+ | N | Next | Next |S | | Current | | +--/--> | State |t R| | State | Output | | State +--/---->a e+-+---/-----> +--/--> Outputs Inputs -----/--> | N |t g| N | Logic | P M | Logic | |e | | | +-------+ +-^-+ +--------+ | Clock Figure 5-2: The Moore State Machine +---------------------------+ | +-------+ +---+ | +--------+ | N | Next | Next |S | | Current | | +--/--> | State |t R| | State | Output | | State +--/---->a e+-+---/-----> +--/--> Outputs Inputs --+--/--> | N |t g| N | Logic | P | M | Logic | |e | +--> | | +-------+ +-^-+ | | | | | | +--------+ | Q (Q <= M) Clock | +--/-------------------------------+ Figure 5-3: The Meally State Machine One question raised by Figure 5-1's Item 1 and Item 2 (see Paragraph 1 and 2 above) is when and where should we use pipeline registers to stage the inputs or outputs? This leads us to the following logic design guideline: *** Logic Design Guideline 5-1 *** The best way to decide when and where to use pipeline register or registers to stage the controller inputs and outputs is to draw a timing diagram showing each register's effect on its outputs with respect to rising or falling edge of the register's input clock. | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | | | | | | | | | | | | +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +- Clock | | | | | | | | | | | | | | | | | | | | | -------+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ | | | | | | | | | | | --------------+ +---------+ +---------------------------------------- Inputs /////// X (1) A X //////////////////////////////////////// --------------+ +---------+ +---------------------------------------- | | | | | | | | | | | ---------------+ +---------+ +--------------------------------------- Next State //// X B (2) X /////////////////////////////////////// ---------------+ +---------+ +--------------------------------------- | | | | | | | | | | | --------------------+ +---------+ +---------------------------------- Current State ////// X B (3) X ////////////////////////////////// --------------------+ +---------+ +---------------------------------- Figure 5-4: A Timing Diagram Showing Relative Timing One simple example of such a timing diagram is shown in Figure 5-4, which shows the effect of the State Register in Figure 5-2, the Moore State Machine. When you read the numbered paragraphs below, please refer to the places in Figure 5-4 labeled with the same numbers in parentheses: 1. We assume the M-bit inputs changes from unknown to "A" right after the rising edge of Cycle 2. 2. Assume the Next State Logic is such that as a result of Input being "A," the Next State will become "B" regardless of its Current State. Then assuming the Next State Logic can generate the Next State output within the cycle time of Clock (this assumption needs to be verified with static timing analysis), then we no longer need to worry about the absolute delay of the Next State Logic. 3. Because as long as Next State becomes "B" one set-up time before the rising edge of Cycle 3, the Current Sate will change to "B" one clock to q delay AFTER the rising edge of Cycle 3 due to the State Register. In this simple example, only one register and three signals are shown. Needless to say, in a real timing diagram, one will have multiple registers and many more signals. The basic idea, however, remains the same: shows only the "relative timing," that is shows how the registers affect the timing of the signals with respect the clock edge(s) but not the absolute delay timing. A corollary of the Design Guideline 5-1 is: *** Logic Design Guideline 5-2 *** The block diagram of the controller should show ALL registers explicitly while the random logic can be represented by a simple black box. By drawing all the registers EXPLICITLY in the block diagram, the designer will less likely to make a mistake when he or she attempt to draw the "relative timing" diagram similar to the one shown in Figure 5-4 (see footnote below) when the designer thinks about the sequence of events need to be controlled. Notice that in Figure 5-1, we try to meet the Design Guideline 5-2 by showing the State Register in the blackbox representing the Finite State Machine. Footnote: Even if the designer does not draw such a timing diagram explicitly on paper, he or she may still has to "draw" it implicitly in his or her head. Notice that both Figure 5-2 and Figure 5-3 show finite state machines with a M-bit input, a N-bit state register, and a P-bit output. The only difference is that in Figure 5-2, the Moore machine, the P-bit output is a function of the N-bit current state only while in Figure 5-3, the Meally Machine, the P-bit output depends on both the N-bit current state as well as a sub-set (Q is an integer smaller or equal to M) of the M-bit inputs. Depending on the state encoding, the N-bit state registers can represents a maximum of 2**N states or a minimum of N states if one-hot encoding is used. *** Logic Design Guideline 5-3 *** If possible, use one-hot encoding for the finite state machine's state encoding to simplify the Output Logic as well as the Next State Logic. One hot encoding refer to the encoding style where each bit of the State Register represents one state and the corresponding bit is asserted only when the finite state machine is at the state represents by that bit. Consequently, only ONE bit of the N-bit state register will be asserted at any given time. My experience is that one-hot encoding can greatly simplify the logic equations for the Output Logic block (in most cases, reduce to simple inverters, AND gates, and OR gates) as well as for the Next State Logic block. Philosophically, the reason why one-hot encoding can simplify the output logic is simple: when the finite state machine designer designs a finite state machine, he or she creates a state for one purpose: the state indicates the need to set the outputs to some values different than any other state (if not, there is no need to have a separate state!) Therefore if the state information is not one-hot encoded, the Output Logic must first decode the N-bit state register before it can generates the output. On the other hand, when one-hot encoding is used, the need for doing a N-to-2**N decode is eliminated. Similarly when one-hot encoding is used, the Next State Logic does not need to perform the equivalent of the N-to-2**N decode before deciding what is the next state and once the next state is decided, it does not need to perform the equivalent of a 2**N-to-N encoding of the next state. One draw-back of one-hot encoding is that for a finite state machine with a large number of states (i.e. N is a big number is Figure 5-2 and Figure 5-3), the State Register can be very wide. A wide register, however, is usually not that bad a problem. In any case, in order to keep the design easy to understand and debug, one may want to avoid using "one BIG and complex" finite state machine anyway: *** Logic Design Guideline 5-4 *** Instead of designing a controller with a giant and complex finite state machine at its core, it may be easier to break the controller into multiple smaller controllers, each with a smaller and simplier finite state machine at its core. In both Figure 5-2 and Figure 5-3, it is possible to integrate the Output Logic block and the Next State Logic block into one single random logic block. However, in order to keep the logic design easy to understand: *** Logic Design Guideline 5-5 *** For finite state machine design, keep the Next State Logic block separate from the Output Logic block. As I will show you later in Verilog Coding Guideline 5-3 and 5-4, the Verilog code that models the finite state machine is also easier to read and understand if the Next State Logic block is kept separate from the Output Logic block. One final word on the Meally Machine shown in Figure 5-3. The Output Logic's input are shown to come from both the Current State and Input. In order to simplify the Output Logic block, it is "logically" equivalent to use some of the Next State bits (i.e. output of the Next State Logic prior to the State Register) as input to the Output Logic block. This is shown in Figure 5-5. This, however, should be done with extreme care. *** Logic Design Guideline 5-6 *** In a Meally Machine design, it is possible to use the Next State Logic block's output as inputs to the Output Logic block. This must be done with caution since the total delay of the two logic block may become the critical path of the controller. +---------------------------+ | +-------+ +---+ |. +--------+ | N | Next | Next |S | | Current | | +--/--> | State |t R| | State | Output | | State +--/-+-->a e+-+----/-------> +--/--> Outputs Inputs --+--/--> | N | |t g| N | | P | M | Logic | | |e | +---> Logic | | +-------+ | +-^-+ | | | | | | (R <= M) | +-> | | | Clock R | | | | | +------------/----+ | +--------+ | Q (Q <= M) | +--/-----------------------------------+ Figure 5-5: An Alternate Form of the Meally State Machine Enclosed in Appendix C are two Verilog files illustrating the various controller design guidelines: trans_defs.v define all the symbolic values applys to all Transport Layer files, see Reference [11]. dtrans_txctl.v models the controller for the Device Dongle's Transport Layer Transmit Engine, see Reference [12]. First, let's take a look at some interesting observations from trans_defs.v. *** Verilog Coding Guideline 5-1 *** If one-hot encoding is used for the finite state machine (see Logic Design Guideline 5-3), define a symbolic value for each bit position as well as a symbolic value for the binary value when that bit position is set. This makes the Verilog code much easier to read and understand. For example, here are some lines from the trans_defs.v file attached in Appendix C (reader can read the entire definition in Appendix C): /* * Define the state values and bit position for Device's Transmit Finite * State machine (FSM in dtran_txctl). This FSM implements the "transmit" * states describes in Section 8.7 (PP. 197-205) of SATA Spec, 1.0. */ `define num_dttxfsm 15 `define B_DTTXIDLE 0 `define B_DTCHKTYP 1 `define B_DTREGFIS 2 // Spec's DT_RegHDFIS `define B_DTPIOSTUP 3 // Spec's DT_PIOSTUPFIS : : : `define B_DTBISSTA 14 // Device Dongle's TX FSM State Values `define DTTXIDLE 15'h0001 `define DTCHKTYP 15'h0002 `define DTREGFIS 15'h0004 // Spec's DT_RegHDFIS `define DTPIOSTUP 15'h0008 // Spec's DT_PIOSTUPFIS : : : `define DTBISSTA 15'h4000 Notice that I have use "define" to create these symbolic values: *** Verilog Coding Guideline 5-2 *** One common convention used by many Verilog code writer is to use "define" for constant values such as: `define DTTXIDLE 15'h0001 while "parameter" is used ONLY for things that can changed such as the width of the register, muxes ... etc. (see also Section 2). /*************************************************************** * Simple N-bit register with a 1 time-unit clock-to-q time ***************************************************************/ module v_reg( q, c, d ); parameter n = 1; input [n-1:0] d; input c; output [n-1:0] q; reg [n-1:0] state; assign #(1) q = state; always @(posedge c) begin state = d; end endmodule // v_reg Next, lets look at the file dtrans_txctl.v. The main module of this file consists of the following sections clearly labeled by comments: module dtrans_txctl ( // Outputs tp_acksendreg, : senddata, // Inputs at_sendreg, : tptx_reset); /* * Next State Logic and the State Register for the finite state machine */ // Next State Logic dtrans_txfsm dtrans_txfsm ( ... // State Register v_reg #(`num_dttxfsm) state_ff (cur_state, txclk4x, next_state); /* * Counter and its MUX tree to select the count limit * for the generation of the expire signal */ /* * Output Logic for generating output signals */ endmodule // dtrans_txctl This leads to the following Logic Design and Verilog Coding guidelines. *** Verilog Coding Guideline 5-3 *** Use an explicit State Register and separate the Next State Logic from this explicit register. For example in dtrans_txctl.v, we have: /* * Next State Logic and the State Register for the finite state machine */ // Next State Logic dtrans_txfsm dtrans_txfsm ( // Outputs .next_state (next_state), // Inputs .cur_state (cur_state), .at_sendreg (at_sendreg), .at_senddmaa (at_senddmaa), : .txtimeout (txtimeout), .expire (expire), .tptx_reset (tptx_reset)); // State Register v_reg #(`num_dttxfsm) state_ff (cur_state, txclk4x, next_state); The Next State Logic here is implemented in the separate "dtrans_txfsm" module in dtrans_txctl.v. The module "dtrans_txfsm" has only one output, the "next_state" vector, and contains only one thing: a "Case Statement" enclosed in a "always" block: *** Verilog Coding Guideline 5-4 *** The Next State Logic, with only ONE output (the "next_state" vector), can be implemented easily with a Verilog Case statement. /************************************************************************* * Module dtrans_txfsm: Random logic for the transmit finite state machine ************************************************************************/ module dtrans_txfsm ( // Outputs next_state, // Inputs cur_state, at_sendreg, at_senddmaa, : expire, tptx_reset); : : always @(cur_state or at_sendreg or at_senddmaa ... /*** List ALL Inputs of this module ***/ txtimeout or expire or tptx_reset) begin if (tptx_reset) begin next_state = `DTTXIDLE; end else begin case (cur_state) `DTTXIDLE: if (~r2t_rxempty) begin /* * Give the receive engine higher priority */ next_state = `DTCHKTYP; end : end : `DTBISSTA: if (~lk_txfsmidle & ~txtimeout) begin next_state = `DTBISSTA; end : default: begin // We should never be here next_state = `DTWAITTXID; $display ( "*** Warning: Undefined HTP RX State, cur_state = %b ***", cur_state); end endcase end // End else (tptx_reset == 0) end // End always endmodule // dtrans_txfsm Notice that the module "dtran_txfsm" has ONLY one output "next_state." This is a very desirable feature when we use the Verilog "Case Statement" because one thing we have to be careful when we use the "Case Statement" is that every output MUST have a defined value for each branch of the Case statement. Otherwise, the synthesis tool will generate a latch to keep the old value, which in most cases is NOT what the logic designer intends. This, having only one output (the "next_state") for the Next State Logic, is one reason why the Logic Design Guideline 5-5 encourages you to separate the Next State Logic block from the Output Logic block. In many finite state machine design, the number of states can be reduced and the Next State Logic can therefore be simplified if one take advantage of the fact that the state machine wants to stay at a certain state for "N cycles" (where N is a fix integer >=1) then go to the next state and stay there for another "M cycles" (M is another integer >= 1 but != N) before move onto another state. One example of this behavior is the DRAM controller where the controller will enter the "Row Address Active" state for a few cycles, then go to the "Column Address Active" state for a few cycles, before moving onto the "Precharge" state ... etc. *** Logic Design Guideline 5-7 *** A finite state machine containing states whose transition to their next states are governed only by the number of cycles it has to wait can be simplified by building a multiplexer tree to select the number of cycles a counter must count before generating an "expire" signal to trigger the state transition. Logic Design Guideline 5-7 is illustrated by the following Verilog code in dtrans_txctl.v. In a nutshell: 1. We start the counter (count_enable = 1) when the current state is either: DTREGFIS, DTPIOSTUP, DTXMITBIS, or DTDMASTUP. Since we are using one-hot encoding, we are in one of this state when the corresponding bit in the cur_state register: cur_state[`B_DTREGFIS], cur_state[`B_DTPIOSTUP], cur_state[`B_DTXMITBIS], or cur_state[`B_DTDMASTUP] is set. assign count_enable = cur_state[`B_DTREGFIS] | cur_state[`B_DTPIOSTUP] | cur_state[`B_DTXMITBIS] | cur_state[`B_DTDMASTUP]; v_countN #(`log_maxfis) expire_count ( .count_out (wcount), .count_enable (count_enable), .clk (txclk4x), .reset (tptx_reset | expire)); 2. Based on the current state, the multiplexer tree is used to select the number of cycles the counter must count (count_limit) before the state is triggered to transition to the next state. /* * Counter and its MUX tree to select the count limit * for the generation of the expire signal */ v_mux2e #(`log_maxfis) regpio_mux (num_regpio, cur_state[`B_DTPIOSTUP], `NDFISREGm1, `NDFISPIOSm1); v_mux2e #(`log_maxfis) dmabis_mux (num_dmabis, cur_state[`B_DTXMITBIS], `NBFISDMASm1, `NBFISBISTAm1); v_mux2e #(`log_maxfis) cntlmt_mux (count_limit, (cur_state[`B_DTXMITBIS] | cur_state[`B_DTDMASTUP]), num_regpio, num_dmabis); The number of cycles the counter needs to count for each state is defined in trans_defs.v: `define NDFISREGm1 3'd4 // Device-to-Host (D) Register (REG) `define NDFISPIOSm1 3'd4 // Device-to-Host (D) PIO Setup (PIOS) `define NBFISDMASm1 3'd6 // Bidirectional (B) DMA Setup (DMAS) `define NBFISBISTAm1 3'd2 // Bidirectional (B) BIST Activate (BISTA) 3. Finally, the 3-bit comparator is used to generate the "expire" signal, which is used as input to the Next State Logic, to trigger the state transition when the counter reaches the "count_limit" selects by the MUX tree in Step 2. v_comparator #(`log_maxfis) expire_cmp (count_full, wcount, count_limit); assign expire = count_full & count_enable; The last part of the Verilog code in dtrans_txctl.v: /* * Random logic for generating output signals */ assign tp_acksendreg = cur_state[`B_DTREGFIS]; : assign tp_acksenddata = cur_state[`B_DTDATAFIS]; assign tp_sendndfis = cur_state[`B_DTREGFIS] | cur_state[`B_DTPIOSTUP] | cur_state[`B_DTDMASTUP] | cur_state[`B_DTDMAACT] | cur_state[`B_DTXMITBIS]; shows how the output logic and the "glue logic" (see Item 3 of Figure 5-1) can be implemented with simple "assign" statements. *** Verilog Coding Guideline 5-5 *** With the more complex Next State Logic already taken care of by the "Case Statement" (see Verilog Coding Guideline 5-3) and with the help of one-hot encoding for the state machine, the Output Logic can usually be implemented easily with simple assign statements. 6. Miscellaneous Verilog Coding Guidelines ----------------------------------------------------------------------------- If you look at the Verilog files in Appendix A, Appendix B, and Appendix C, you will notice all the verilog files have very similar format. *** Verilog Coding Guideline 6-1 *** In order to keep the Verilog files easy to read and easy to understand for every member of the design team, adopt a standard format and use the same format for all Verilog files. For example, the link_txdp.v file in Appendix B follows this format: module module_name ( // Bi-directional ports (if any) bi_port1, //*** First list the inout ports (if any) bi_port2, //*** List one port per line // Output ports o_port3, //*** Then list the output ports o_port4, // Input ports i_port5); //*** Finally, list the input ports /* * Declare all bi-directional ports */ inout bi_port1; //*** Declare one port per line inout bi_port2; /* * Declare all output ports */ output o_port3; output o_port4; /* * Declare all input ports */ input i_port5; /* * After all ports are declared, declare all the wires */ wire wire1; //** Declare one wire per line wire wire2; /* * Declare all registers (if any) */ reg reg1; //** Declare one register per line reg reg2; /* * Core of the Verilog code */ endmodule Notice that in link_txdp.v file in Appendix B, when the module "l_scramble" is instantiated, explicit connection (example: .reset (lktx_reset)) is used. l_scramble scrambler ( .scr_out (scr_out), .scr_in (32'hc2d2768d), .scr_init (txscr_init), .scr_run (txscr_run), .clk (txclk4x), .reset (lktx_reset)); *** Verilog Coding Guideline 6-2 *** In order to avoid confusion on which wire is connected which port, use explicit connection (example: .port_name (wire)) when a module is instantiated. The module l_scramble module is defined in the file link_library.v which is also included in Appendix B. Notice the detailed comment in this module: /* Priority: * scram scr_out ------------------------------- * | | reset (asynchronous): highest * +---v---v---+ scr_init (synchronous): middle * scr_run-->\S 1 0 / scr_run (synchronous): lowest * +---+---+ scr_in * | | * +---v-------v---+ * \ 0 1 S/<--scr_init (higher priority than scr_run) * +-----+-----+ * | * v * lastmux */ *** Verilog Coding Guideline 6-3 *** In order to keep the Verilog code easy to understand for everyone (including yourself :-), use detailed comments. More importantly, put in the comments as you do the coding because if you do not put in the comments now, it is unlikely you will put them in later. Finally, one may notice the absent of the "timescale" statements in any of the files that models the high level modules (Appendix A), the datapath (Appendix B), and the controller (Appendix C). The reason is that there is no need to have any timescale statements in the Verilog code if the Verilog Coding Guideline 2-2 is followed: *** Verilog Coding Guideline 2-2 *** Only the storage elements (examples: register and latch) have non-zero clock-to-q time. All combinational logic (example: mux) has zero delay. More specifically, as shown in Section 2, the v_reg and v_latch each has "1 time unit" clock-to-q delay. This clock-to-q delay is the ONLY delay we have in our Verilog code. Consequently, our Verilog code will work no matter what time scale this time unit is set to (i.e. it can set to 1ps, 1ns, 1ms, ... etc.). The only time we need to have a timescale statement is when we want to run simulation on our Verilog model. *** Verilog Coding Guideline 6-4 *** Ideally, there should not be any "timescale" directive in any of the Verilog file that models the hardware (because they are not needed if we follow the Verilog Coding Guideline 2-2). Consequently, there should only be ONE and only ONE timescale directive in any Verilog simulation run and that timescale directive should be placed at the beginning of the test bench file (see Reference [13]). 7. Summary of Logic Design and Verilog Coding Guidelines ----------------------------------------------------------------------------- Below is a summary of all the logic design guidelines: *** Logic Design Guideline 2-1 (MOST IMPORTANT) *** The design MUST be as simple as possible and easy to understand! *** Logic Design Guideline 3-1 *** Use an hierarchal strategy that breaks the design into modules that consists of datapaths and controllers. More specifically: 1. Divide the problem into multiple modules with clean and well defined interface. 2. For each module: a. Design the datapath that can process the data for that module. b. Design the controller to control the datapath and produce control outputs (if any) to other adjacent modules. *** Logic Design Guideline 3-2 *** Keep different clock domains separate and have an explicit synchronization module for signals that cross the clock domain. *** Logic Design Guideline 4-1 *** The best way to study the effect of the datapath's pipeline registers is to draw a timing diagram showing each register's effect on its outputs with respect to rising or falling edge of the register's input clock. *** Logic Design Guideline 4-2 *** The block diagram of the datapath should show ALL registers, including the implicit register of the Sequential Datapath Element. *** Logic Design Guideline 4-3 *** While designing the Sequential Datapath Elements, separates the element into the two parts: (1) the combinational logic, and (2) the register. *** Logic Design Guideline 5-1 *** The best way to decide when and where to use pipeline register or registers to stage the controller inputs and outputs is to draw a timing diagram showing each register's effect on its outputs with respect to rising or falling edge of the register's input clock. *** Logic Design Guideline 5-2 *** The block diagram of the controller should show ALL registers explicitly while the random logic can be represented by a simple black box. *** Logic Design Guideline 5-3 *** If possible, use one-hot encoding for the finite state machine's state encoding to simplify the Output Logic as well as the Next State Logic. *** Logic Design Guideline 5-4 *** Instead of designing a controller with a giant and complex finite state machine at its core, it may be easier to break the controller into multiple smaller controllers, each with a smaller and simplier finite state machine at its core. *** Logic Design Guideline 5-5 *** For finite state machine design, keep the Next State Logic block separate from the Output Logic block. *** Logic Design Guideline 5-6 *** In a Meally Machine design, it is possible to use the Next State Logic block's output as inputs to the Output Logic block. This must be done with caution since the total delay of the two logic block may become the critical path of the controller. *** Logic Design Guideline 5-7 *** A finite state machine containing states whose transition to their next states are governed only by the number of cycles it has to wait can be simplified by building a MUX tree to select the number of cycles a counter must count before generating an "expire" signal to trigger the state transition. Below is a summary of all the Verilog coding guidelines: *** Verilog Coding Guideline 2-1 *** Model all the standard logic elements in a library file to be SHARED by ALL engineers in the design team. *** Verilog Coding Guideline 2-2 *** Only the storage elements (examples: register and latch) have non-zero clock-to-q time. All combinational logic (example: mux) has zero delay. *** Verilog Coding Guideline 2-3 *** Use explicit register and latch (example: v_reg and v_latch as shown in Section 2) in your verilog coding. Do not rely on logic synthesis tools to generate latches or registers for you. *** Verilog Coding Guideline 3-1 *** A separate Verilog file is assigned to the Verilog code for: 1. Each datapath. Example: dtrans_txdp.v 2. Each controller. Example: dtrans_txctl.v 3. As well as the Verilog code for each high level module, that is a module at a hierarchy level higher than the datapath and the controller. Examples: link_tx.v, link_rx.v, and link.v *** Verilog Coding Guideline 3-2 *** In order to keep the number of Verilog files under control, one should try not to assign a separate Verilog file to any low level module that is at a hierarchy level lower than the datapath and the controller. *** Verilog Coding Guideline 3-3 *** The Verilog code for the high level module, that is module at a hierarchy level higher than the datapath and the controller (examples: module dtrans_tx, module dtrans_rx, and module dtrans) should not contains any logic. It should only shows how the lower level modules are connected. *** Verilog Coding Guideline 4-1 *** Keep the verilog coding of the datapath simple and straight forward. Leave the fancy coding (IF any) to the datapath elements and place such elements in a separate (library) file. *** Verilog Coding Guideline 4-2 *** The Verilog coding of the datapath elements should make use of the standard logic elements (registers, multiplexers, ... etc.) already defined in the library discussed in Verilog Coding Guideline 2-1. *** Verilog Coding Guideline 4-3 *** Define symbolic values (see also Verilog Coding Guideline 5-2) in a header file (example: link_defs.v) and include this header file in all files that can make use of these symbolic values to make the Verilog code easier to maintain and easier to understand. *** Verilog Coding Guideline 5-1 *** If one-hot encoding is used for the finite state machine (see Logic Design Guideline 5-3), define a symbolic value for each bit position as well as a symbolic value for the binary value when that bit position is set. This makes the Verilog code much easier to read and understand. *** Verilog Coding Guideline 5-2 *** One common convention used by many Verilog code writer is to use "define" for constant values such as: `define DTTXIDLE 15'h0001 while "parameter" is used ONLY for things that can changed such as the width of the register, muxes ... etc. (see also Section 2). *** Verilog Coding Guideline 5-3 *** Use an explicit State Register and separate the Next State Logic from this explicit register. *** Verilog Coding Guideline 5-4 *** The Next State Logic, with only ONE output (the "next_state" vector), can be implemented easily with a Verilog Case statement. *** Verilog Coding Guideline 5-5 *** With the more complex Next State Logic already taken care of by the "Case Statement" (see Verilog Coding Guideline 5-3) and with the help of one-hot encoding for the state machine, the Output Logic can usually be implemented easily with simple assign statements. *** Verilog Coding Guideline 6-1 *** In order to keep the Verilog files easy to read and easy to understand for every member of the design team, adopt a standard format and use the same format for all Verilog files. *** Verilog Coding Guideline 6-2 *** In order to avoid confusion on which wire is connected which port, use explicit connection (example: .port_name (wire)) when a module is instantiated. *** Verilog Coding Guideline 6-3 *** In order to keep the Verilog code easy to understand for everyone (including yourself :-), use detailed comments. More importantly, put in the comments as you do the coding because if you do not put in the comments now, it is unlikely you will put them in later. *** Verilog Coding Guideline 6-4 *** Ideally, there should not be any "timescale" directive in any of the Verilog file that models the hardware (because they are not needed if we follow the Verilog Coding Guideline 2-2). Consequently, there should only be ONE and only ONE timescale directive in any Verilog simulation run and that timescale directive should be placed at the beginning of the test bench file (see Reference [13]). With all these logic design and Verilog coding guidelines, does this mean there is no room for logic designer to be creative? Not at all. Artists such as movie directors and music composers need to follow many guidelines and yet nobody can say they are not doing creative work. They just spend their creativity at tasks that require creativity and follow the standard guidelines (such as a movie should be approximately 2 hours long) when creativity is not needed. Logic design is the same: be creative on tasks that truly deserves innovation (such as how to build a datapath that can process data at half the power) but not on tasks such as how to write a complex Verilog statement that can save a few lines of Verilog code but nobody else can understand. The ultimate goal for any logic designer is to keep his or her design and the Verilog code that models the design AS EASY TO UNDERSTAND AS POSSIBLE. Remember this, the easier other people can understand your design and your Verilog code, more people can help you in your work and less likely will your vacation be interrupted by late night phone calls from your coworker covering for you :-) So make your design easy to understand :-) 8. References ----------------------------------------------------------------------------- [1] Private communications, October 2001. [2] For those readers who can access my home directory, the Parallel ATA Interface to the disk is modeled by the module dataif in the Verilog file: /home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif.v [3] For those readers who can access my home directory, the Transport Layer is modeled by the module dtrans in the Verilog file: /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans.v [4] For those readers who can access my home directory, the Link Layer is modeled by the module link in the Verilog file: /home/kong/P2001/Verilog/DeviceDongle/Link/link.v [5] For those readers who can access my home directory, the files are in: /home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif.v /home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif_dp.v /home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif_ctl.v [6] For those readers who can access my home directory, the files are in: /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_tx.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_txdp.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_txctl.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_rx.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_rxdp.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_rxctl.v [7] For those readers who can access my home directory, the files are in: /home/kong/P2001/Verilog/DeviceDongle/Link/link.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_tx.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_txdp.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_txctl.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_rx.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_rxdp.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_rxctl.v [8] For readers have accessed to my home directory, link_txdp.v is in: /home/kong/P2001/Verilog/DeviceDongle/Link/link_txdp.v [9] For readers have accessed to my home directory, link_library.v is in: /home/kong/P2001/Verilog/CommonFiles/link_library.v [10] For readers have accessed to my home directory, link_defs.v is in: /home/kong/P2001/Verilog/CommonFiles/link_defs.v Note: Both the link_library.v (Reference [9] above) and link_defs.v are placed in the "CommonFiles" directory because they are used by all Link Layer files. [11] For readers have accessed to my home directory, trans_defs.v is in: /home/kong/P2001/Verilog/CommonFiles/trans_defs.v Note: The file trans_defs.v is placed in the "CommonFiles" directory because it is used by all Transport Layer files. [12] For readers have accessed to my home directory, dtrans_txctl.v is in: /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_txctl.v [13] For those readers who can access my home directory, please refer to: /home/kong/P2001/Verilog/SATASys/Tests/test_init.v ------------------------ That's all for now folks :-) ------------------------