(2.1.4) Multiscalar Processors


Gurindar S. Sohi, Scott E. Breach, and T.N. Vijaykumar, Multiscalar Processors, Proc. 22nd Annual Symposium on Computer Architecture, June 1995, pp. 414-425. ACM DL Link


Static program > Control Flow Graph. 
Dynamic program execution > walking thru the prog CFG, generating a dyn seq of basic blocks. 

Speculation : To achieve high ILP, loads are speculated (if stored by a predecessor task). 
1) control speculation
2) data

Multiscalar programs 
The actual code
details of the structure of the CFG
communication characterisctics of individual codes 
task descriptor : what registers a task may produce (create mask) : conservative. 
operate-and-forward instruction : one task done, can forward results to following processing units. 
dead memory value analysis to release registers/memory for future tasks

Hardware required :
Scheduler
Address Resolution Buffer : before data cache : holds speculative reads/writes
 
What happens to cycles ?
Non-Useful compute cycles 
Syncronization and data communication
Early validation of prediction required. 
No computation cycles
Inter and intra task dependencies
load balancing
 
 
How multiscalar differs ?
Branch prediction accuracy does not limit, because compiler has an idea of CFG, almost 100% bp can be achieved between task scheduling (forget inside task prediction). 
n instructions => n^2 complexity to perform dependence cross checks. (not quite necessary in tasks)
load stores can be whatever memory model inside tasks. 
 
vs multithreaded : multiple threads or loci of conrol which are contrl independent and (typically) data independent. 
executing on a multiscalar proc are related as different parts of a sequential walk through the same program, and are not control and data independent. 

granularity of tasks matter

Summary
expoiting fine-grain ILP
combination of hw/sw
dividing the program CFG into tasks, stepping thru the CFG speculatively, taking large steps, a task at a time, without pausing to inspect the contents of a task. 
processor complex uses multiple PC to sequence through different parts of the CFG simultaneously.