CS 755 - Spring 2002 Project Description

Design and VLSI Implementation of a 4-port,
Fair, Double-Priority, Crossbar Switch based on Virtual-Output-Queueing

Functional Specification and Constraints

You are to design and implement down to the layout level a 4x4 crossbar switch for fixed-size packets. The switch will use input-queueing, but in order to avoid the Head-Of-Line (HOL) blocking effect it will support Virtual-Output-Queueing, i.e., a separate packet queue for each output port at each input port. Two packet priorities (low and high) have to be supported.

The packet format is:

Routing

Each switch contains a 16-entry by 2-bit routing table. When a packet arrives, the 4-bit destination address is used to index into the routing table, which returns a 2-bit output port number.

The routing table must be initialized before packets may be routed through the switch. You are free to design the initialization logic assuming a serial scanpath, parallel write path, or other approach.

Switch Model

The basic switch model is shown in Figure 2. Note that each input port has 8 FIFO queues, denoted as O0-L, O0-H, O1-L, O1-H, O2-L, O2-H, O3-L, O3-H. The notation `Ox-P' means: Output port x, priority class P. Consequently, the switch will totally have 32 queues, 8 for each input port.

Figure 2: The basic switch model.
The notation `Ox-P' means: Output port x, priority class P.

You may decide how you wish to implement the input queues. In some designs, the input queues are only logical queues. In the actual implementation the 8 queues of each input port share a single centralized pool of packet buffers. Such a pool of buffers can better utilize the port memory resources. For example, you might only implement 16 packet buffers to implement the 32 logical queues.

An alternative design uses packet buffers local to each input queue. This type of design typically requires more buffers, but can run at a faster frequency because it eliminates the shared resource and facilitates a pipelined, hierarchical scheduler (i.e., select the next packet from a local port, then globably arbitrate).

Performance Specification and Constraints

The clock frequency will not be specified, and it is up to the designers to attempt to maximize it. You can achieve this with any combination of architectural, logic-design, circuit-design, or layout techniques. If you hope to achieve high-performance, you should consider pipelining your design.

A performance constraint, though, is that the switch should be able to achieve the peak throughput of 4 packets per cycle whenever possible, and whenever this does not conflict with the priority constraint. In other words, if there are four packets in the four input ports, respectively, destined to the four different output ports, the switch should be able to deliver all these packets in the same cycle. This constraint, however, is weaker than the constraint of servicing a high priority packet before servicing any low priority packets.

Scheduler Specification and Constraints

Note that in each clock cycle only one input port can be connected to an output port, and also, in each clock cycle only one input queue of an input port can be serviced. The switch scheduler determines at each clock cycle the configuration of the switch (which input port is connected to each output port), and also it determines which queue of each input port will transmit a packet during that cycle.

In order to simplify the project, the scheduler can be designed as centralized, i.e., it knows at anytime the status of all input queues. Or, for higher performance, it can be made hierarchical.

The exact algorithm and architecture of the switch scheduler is part of your work in this project. The constraint that you have to follow, however, is that the scheduler should be fair, i.e., it should not let some input queues (of the same priority level) starve, because some other queues are always serviced first. The exact interpretation of the fairness constraint, however, is not specified here.

Input and Output Interfaces

To keep the pin count low and model the multiple cycle packet transmission of real systems, the input and output interfaces assume a 4-bit parallel interface. For simplicity, you may assume that the interface is synchronous with the rest of the switch. In other words, the clock signals for both the input interfaces and the switch core are the same, without any phase difference.

Bells and Whistles

This section describes some ideas for possible project extensions. Larger groups (3 or 4 people) should plan on implementing at least the ability to write the routing table.

Group Size, Milestones, Deadlines, and Report

The project should be done in groups of 2 to 4 people, with the understanding that larger groups will have to provide additional functionality, or to strive for higher performance and/or denser designs.

There will be three phases in the project design:

A final demonstration of the project will be scheduled for May 13th. All the group members should be present in the demo.

Readings

For a survey of the basics about switches see: To find more on virtual-output-queueing see:

For an implementation of such a switch see:

For a recent switch that uses local buffering, see:

Additional references may be given later.