CS 755 - Spring 2002 Project Description
Design and VLSI Implementation of a 4-port,
Fair, Double-Priority, Crossbar
Switch based on Virtual-Output-Queueing
Functional Specification and Constraints
You are to design and implement down to the layout level
a 4x4 crossbar switch for fixed-size packets.
The switch will use input-queueing,
but in order to avoid the Head-Of-Line (HOL) blocking effect
it will support Virtual-Output-Queueing,
i.e., a separate packet queue for each output port at each input port.
Two packet priorities (low and high) have to be supported.
The packet format is:
R<3:0> |
X<1:0> |
P<1> |
C<1> |
D<7:0> |
- R<3:0>: The destination address. This is used to access
the routing table to determine the correct output port.
- X<1:0> These extra bits are provided to enable implementation
of the various optional features described in the "Bells and Whistles"
section.
- P<1>: The priority of the packet (0 for low, 1 for high).
In the basic specification of the project, a high-priority
packet should be always transmitted before a low-priority
packet, if both packets compete for the same output port.
- C<1>: The check bit of the packet. This bit must be set to
ensure that the transmitted packet has odd parity (an odd number of
bits are set). The switch should verify
the correctness of the check bit upon the arrival of the packet.
If the parity is incorrect, the router should set an internal
parity_error status register and drop the packet.
- Data<7:0>: The 8-bit packet payload.
Routing
Each switch contains a 16-entry by 2-bit routing table. When a packet
arrives, the 4-bit destination address is used to index into the
routing table, which returns a 2-bit output port number.
The routing table must be initialized before packets may be routed
through the switch. You are free to design the initialization logic
assuming a serial scanpath, parallel write path, or other approach.
Switch Model
The basic switch model is shown in Figure 2.
Note that each input port has 8 FIFO queues, denoted as
O0-L, O0-H, O1-L, O1-H, O2-L, O2-H, O3-L, O3-H.
The notation `Ox-P' means: Output port x, priority class P.
Consequently, the switch will totally have 32 queues, 8 for each input
port.
Figure 2: The basic switch model.
The notation `Ox-P' means: Output port x, priority class P.
You may decide how you wish to implement the input queues. In some
designs, the input queues are only logical queues. In the actual
implementation the 8 queues of each input port share a single
centralized pool of packet buffers.
Such a pool of buffers can better utilize the port memory resources.
For example, you might only implement 16 packet buffers to implement
the 32 logical queues.
An alternative design uses packet buffers local to each input queue.
This type of design typically requires more buffers, but can run
at a faster frequency because it eliminates the shared resource and
facilitates a pipelined, hierarchical scheduler (i.e., select the next
packet from a local port, then globably arbitrate).
Performance Specification and Constraints
The clock frequency will not be specified, and it is up to the
designers to attempt to maximize it. You can achieve this with
any combination of architectural, logic-design, circuit-design, or
layout techniques. If you hope to achieve high-performance, you
should consider pipelining your design.
A performance constraint, though, is that the switch should be
able to achieve the peak throughput of 4 packets per cycle whenever possible,
and whenever this does not conflict with the priority constraint.
In other words, if there are four packets in the four input ports,
respectively, destined to the four different output ports, the
switch should be able to deliver all these packets in the same cycle.
This constraint, however, is weaker than the constraint of servicing a
high priority packet before servicing any low priority packets.
Scheduler Specification and Constraints
Note that in each clock cycle only one input port can be connected
to an output port, and also, in each clock cycle only one input queue
of an input port can be serviced.
The switch scheduler determines at each clock cycle the
configuration of the switch (which input port is connected to
each output port), and also it determines which queue of each
input port will transmit a packet during that cycle.
In order to simplify the project, the scheduler can
be designed as centralized, i.e., it knows at anytime the status
of all input queues. Or, for higher performance, it can be
made hierarchical.
The exact algorithm and architecture of the switch scheduler is
part of your work in this project. The constraint that you have to
follow, however, is that the scheduler should be fair,
i.e., it should not let some input queues (of the same priority
level) starve, because some other queues are always serviced first.
The exact interpretation of the fairness constraint, however, is
not specified here.
Input and Output Interfaces
To keep the pin count low and model the multiple cycle
packet transmission of real systems, the input and output
interfaces assume a 4-bit parallel interface.
For simplicity, you may assume that the interface is synchronous
with the rest of the switch. In other words, the clock signals for both
the input interfaces and the switch core are the same, without any
phase difference.
Bells and Whistles
This section describes some ideas for possible project extensions.
Larger groups (3 or 4 people) should plan on implementing at least
the ability to write the routing table.
- Many modern systems use control packets to identify and initialize
legal routes. For example, probe packets are used to determine whether a
switch or endpoint connects to a particular output port. Other control
packets read and write the routing table to initialize and check the
routing information remotely.
- Source-synchronous signalling. Many high-performance
interconnection network routers use source-synchronous signalling,
where each data output port sends a clock along with the data.
Each data input port is clocked according to the received clock
signal, which then must re-synchronize the data with the clock of the
internal router logic. The simplest approach is mesochronous
clocking, where all clocks have the same frequency (i.e., are
driven by a single master clock) but may have arbitrary phase
shifts.
- Loss-free flow control. Internet routers routinely drop packets
due to congestion. Conversely, interconnection network routers normally
implement flow control to prevent packet loss. Add additional signals
to the input/output interfaces or use the extra bits in the packet to
implement a lossless flow control scheme (e.g., sliding window).
- Virtual Channels to enable deadlock free routing. Virtual channels
(aka Virtual Lanes) provide the abstraction of having multiple channels
between two routers, but share the same physical wires. This is typically
done by implementing separate buffers, but then arbitrating for the same
physical link.
Group Size, Milestones, Deadlines, and Report
The project should be done in groups of 2 to 4 people, with the
understanding that larger groups will have to provide additional
functionality, or to strive for higher performance and/or denser designs.
There will be three phases in the project design:
- Milestone-1: April 12
- Complete the high-level design of your project, including determining
which Bells and Whistles you plan to include.
- Write
a functional simulator (e.g., in C, C++, or Java) which will demonstrate
the operation of the switch. The simulator should include your
scheduling algorithm and should be written in a way that can be easily
used by the class instructors when evaluating the project.
The simulator will be used later for testing the correctness of your design,
by comparing the output sequences that it produces from arbitrary
input packet sequences with the answers that your design gives.
- Complete the logic design of the switch using the library
of standard cells and verify its correctness using QuickSim
simulations.
- Write a short report, explaining the scheduler algorithm and
your logic design.
- Include your tentative schedule and detailed task breakdown.
- Milestone-2: April 22
- Create the floorplan of the final layout. This includes determining
the components of the design that will be made in full-custom (or SDL),
and the components that will be made in standard-cells. Also, decide
the relative position of the chip ports, and the relative structure of
the global interconnects.
- If you have decided to optimize some design components with
faster circuits, you should have completed the circuit designs
and Accusim simulations by this point.
- Complete the full-custom layout of a single packet buffer,
taking into account that this layout cell will be later used
hierarchically to create the shared pool of packet buffers.
- Write a short report, showing the progress of your work in the
phase.
- Include an updated schedule and detailed task breakdown.
- Final Deadline: May 13
- Complete the layout of the project. The layout should pass both
DRC and LVS checks.
- Determine the maximum possible clock frequency, using Accusim
simulations on the extracted netlist. Due to the large size of the
circuit you may not be able to run many simulations. Consequently,
attempt to idenify the critical path of the design, and simulate
input sequences that excite that path.
- Write a report including the results of this last phase of the
project development.
A final demonstration of the project will be scheduled for
May 13th. All the group members should be present in the demo.
Readings
For a survey of the basics about switches see:
H.Ahmadi and W.E.Denzel,
"A Survey of Modern High-Performance Switching Techniques",
IEEE Journal on Selected Areas in Communications,
9(7):1091--1103, Sept 1989.
To find more on virtual-output-queueing see:
For an implementation of such a switch see:
For a recent switch that uses local buffering, see:
Additional references may be given later.