Gurindar S. Sohi, Scott E. Breach, and T.N. Vijaykumar, Multiscalar Processors, Proc. 22nd Annual Symposium on Computer Architecture, June 1995, pp. 414-425. ACM DL Link |
Static program > Control Flow Graph.
Dynamic program execution > walking thru the prog CFG, generating a dyn seq of basic blocks.
Speculation : To achieve high ILP, loads are speculated (if stored by a predecessor task).
1) control speculation
2) data
Multiscalar programs
The actual code
details of the structure of the CFG
communication characterisctics of individual codes
task descriptor : what registers a task may produce (create mask) : conservative.
operate-and-forward instruction : one task done, can forward results to following processing units.
dead memory value analysis to release registers/memory for future tasks
Hardware required :
Scheduler
Address Resolution Buffer : before data cache : holds speculative reads/writes
What happens to cycles ?
Non-Useful compute cycles
Syncronization and data communication
Early validation of prediction required.
No computation cycles
Inter and intra task dependencies
load balancing
How multiscalar differs ?
Branch prediction accuracy does not limit, because compiler has an idea of CFG, almost 100% bp can be achieved between task scheduling (forget inside task prediction).
n instructions => n^2 complexity to perform dependence cross checks. (not quite necessary in tasks)
load stores can be whatever memory model inside tasks.
vs multithreaded : multiple threads or loci of conrol which are contrl independent and (typically) data independent.
executing on a multiscalar proc are related as different parts of a sequential walk through the same program, and are not control and data independent.
granularity of tasks matter
Summary
expoiting fine-grain ILP
combination of hw/sw
dividing the program CFG into tasks, stepping thru the CFG speculatively, taking large steps, a task at a time, without pausing to inspect the contents of a task.
processor complex uses multiple PC to sequence through different parts of the CFG simultaneously.