Data Speculation support on a CMP.

 

This paper discusses about the combined H/W and S/W support for data speculation on a CMP. The CMP offers low latency, high bandwidth interconnects between processors and the data speculation support may be attractive, as they rely on this, so that the thread synchronization, control and dependency violation checks could be done quickly and efficiently.

 

Why Thread level data speculation?

In C integer programs, static pointer disambiguation is tough, compilers find it hard to analyze the dependencies to guarantee that threads are indeed parallel. Hence in thread level speculation the sequential program gets partitioned into threads, each thread then executes on a processor.

 

Two techniques for speculative thread execution.

  1. Subroutine threads
  2. Loop Iteration

Subroutine threads:

Subroutine calls cause a fork to occur, the original head processor continues to execute non-speculatively to handle exceptional situations (for e.g.: OS calls). RPBÕs are created and assigned to the processors to run speculatively.

Loop Iteration:

Predicts as which loop to speculate and assigns 4 RPBÕs to the loop. (Quick set). Loops with subroutines are dealt more carefully with respect to speculation. Only the inner most loop is executed speculatively.

 

Uses a Speculation Co-processor and modifies the data cache to support speculation.

 

Issues?

1.Increase in memory traffic due to speculative loads. No register allocation of commonly used variables.

2. Processor utilization really low for the 3rd and 4th speculative processors as they are waiting for the most of the time.

3. Can a more aggressive compiler be helpful in moving the loads and stores as far as possible?