(3.5.1) Thread Level Speculation

J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000. ACM DL link

parallelizing non numeric and irregular numeric difficult (complex control flow and memory access patterns)

TLS : auto parallelize code
Epochs : time stamped with a epoch number (ordering)
Track data dependency between epochs
homefree token : when guaranteed no violations, this epoch can commit.

Objectives
large-scale parallel machine (single chip mups or SMT) seamlessly perfom TLS across entire machine > communication diff
no recompilation

Detect data dependence violation at run time > leverage invalidation based cache coherence.
if an invalidation arrives from a logically-earlier epoch for a line that we have speculatively loaded > bonkers.

Speculation level > Cache at which speculation occurs

whats required
   (i) notion of whether a  a cache line has been speculatively loaded and/or modified
   (ii) guarentee that a pec cache line will not be propagated to regular memory
        spec fails if cache line is replaced!
   (iii) ordering of all spec mmory references (epoch numbers and homefree token)

Hardware
Cache states
   Apart from Dirty, Shared, Exclusive and Invalid, speculatively loaded and specul modified.
   no kicking out till that epoch becomes home free. (if a must, speculation fails)
Messages
   read-exclusive-speculative, invalidation speculative, upgrade request speculative
   + epoch number of the requester
   > only hints, no real need to oblige.

when speculation succeeds
   instead of scanning all lines and changing spec states to normal
   Ownership required buffer > when a line becomes both speculatively modified and shared.
   when home free token arrives, generate upgrade request  for each entry in ORB.

Optimizations
   Forwarding data between epochs : have wait-signal synchronization
   Dirty and spec loaded state : anyway speculated dirty is never evicted, so if you load and then modify, just store it as DSpL state
   suspend the epochs that have violations (resume when u get homefree token)
   Support for multiple writers : combine results (ah!) fine grained SM bits (bytes/words in cache line)

support in SMT machines
   (i) two epochs may not modify the same line
   (ii) t- epoch cannot see t+ modification

Conclusions
8-75% speedups!
ORB overhead low (used to commit speculative modifications faster)