Title: Complexity-Effective Superscalar Processors
Authors: Subbarao Palacharla, Norman P. Jouppi and J. E. Smith
Conf: ISCA 1997

Context of the paper, Motivation/ Problems looked at, Overview
of mechanisms proposed, Trade-offs, interesting results/ take-away points

Context: 
Early superscalar processors, performance tradeoff between hardware
complexity and clock speed. 

Motivation:
	1) Braniacs vs Speed demons. Two possibly conflicting goals - maximize 
	instructions in flight(issue-window), maximize clock frequency.
	Study mechanisms that lead to increased ILP, and their impact
	on clock i.e look at mechanisms on critical path. 
	2) Analyze hardware structures at a micro-architectural level. 
	Characterize complexity w.r.t implementation parameters(underlying technology)
	and micro-architectural parameters(window size, issue width).

Details:
	Paper looks at instruction dispatch and issue logic, and data bypass logic.
	Logic associated with these likely to be key limiters of clock speed. (Still true?)
	Considering a baseline superscalar model without reorder buffers.

Proposals:
	Dependence-based architecture that groups dependent instructions rather than independent 
	ones. Not the major contribution of the paper.
	
Complexity Analysis:
	
	Basic Structures looked at:
	Insn Dispatch - Register rename logic		
	Insn Issue - Wakeup logic, Selection logic
	Data Bypass - bypass logic

	Methodology:
	First, representative CMOS circuits for these hardware structures are selected.	
	Second, circuits are optimized for speed. 

	Register Rename logic:	
		RAM based or CAM based. CAM less scalable 'cause the number of CAM entries(= No. physical regs) 
		increases with issue width(??).
		Window size is not a factor, and the issue width affects delay though its impact on wire lengths.
		** Wire delays will become increasingly important as feature sizes are reduced **

	Wakeup logic:
		2*IssueWidth comparators per instruction in the issue window. 
		Issue width has a greater impact on the delay than window size.
		
	Selection logic:
		Tree-based scheme.
		Delay increases logarithmically with window size. Total delay scales well with feature size.
	
	Data Bypass logic:
		Number of bypass paths grows quadratically with issue width.
		Bypass delay grows quadratically with issue width.
		
Complexity-Effective Microarchitectures:
	
	Nice idea. FIFOs containing dependent instructions. Determine dependencies early. 
	Lose out on performance(IPC) but better clock speed.
	Clustering Dependence-based Microarchitecture.
		- Single-Window, Execution-Driven Steering
		- Two windows, Dispatch-Driven Steering.
		- Two Windows, Random Steering

Take-away points:
	1) Window-logic and data bypass on critical path. As you improve these, other structures could become
	critical. 
	2) Design complexity-effective structures i.e those that give good ILP and while facilitating
	a faster clock.