Supporting Fine-Grained Synchronization on a Simultaneous Multithreading
Processor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Major Points:
* Reduce synchronization cost for fine grained threads on SMTs
* Implementation of a thread-shared hardware "lock box".
- Simple and scalable hardware design.
- Each lock box entry has lock address, pointer to instruction grabbing the lock
and v-bit.
- Guarrantte starvation prevention and deadlock avoidance.
* Speculative prediction of lock release.
- Prediction based on thread ID and PC history.
- Reduce the critical path from 15 cycles to 9 cycles.
- Gains an additional performance by 40%.
Synchronization Mechanisms:
* Spin locks such as test-and-set and load-locked and store-conditional
instructions.
* Full/empty bits associated with each memory block.
* Full/empty bits to registers.
* Shared registers for synchronizations.
Goals for SMT synchronization:
* High performance.
* Resource-conservative.
* Deadlock-free.
* Scalable.
Evaluation:
* Trace driven simulation. Can this give you accurate performance results?
* Are locks very frequent in applications like the authors claimed?
* Only find 6 loops from several benchmarks and conclude based on performance
from these loops.