Subbarao Palacharla, Norman P. Jouppi and J. E. Smith. Complexity-effective Superscalar Processors, Proc. 24th Annual International Symposium on Computer Architecture, June 1997, pp. 206-218.
|
paper looks at complexity (in delay of critical path)
logic associated with issue window and data bypasses cause max delay
R10000 and DEC 21264 : register commit, active list
PowerPC, HP PA-8000, HAL SPARC 64 : rob holds non commited renamed register values.
> diff : size of register file.
IW == issue width
checkpointing rename logic
RAM : logical address to phy addr direct
Trename = Tdecode + Twordline + Tbitline + Tsenseamp
Tdecode, Twordline, Tbitline = c0 + c1 x IW + c2x IW^2
CAM : phy reg to logical registers
Wakeup logic
issue window > CAM array holding one instr per entry.
Delay = Ttagdrive + Ttagmatch + TmatchOR > quadratic all.
Selection logic
Tselc = c0 + c1 x log4(WINSIZE)
Data Bypass Logic
S pipestages > (2 x IW^2 x S) bypass paths (2 inp func units)
datapath and control
0.5 x R x C x L^2 (length of wire)
Even if alternative layouts are used : quadratic delay with bypasses remain
reg file/bypass within cluster single cycle, across clusters multi.
>> biggest problems > window logic and bypasses
>> Atomic operations
wakeup and select
data bypassing
New Microarch defn
dependence based microarchitecture
issue window > number of FIFOs.
inorder issue of dependent instructions
bypass is propogated to the head of the instr queue
FIFO entry :
all operands of I are ready : new empty FIFO
1 outstanding operand : FIFO generating that thing just behind the Isource else new FIFO
two outstanding : try both operands.
clustering based dependent
group issue and exec units into clusters (these cna be fixed/windows)
PEW : parallel execution windows
Conclusions :
parameters that affect > issue width and issue window size.
data bypass logic
may affect in future > register files and caches