########################################################################## # # Comments and Thoughts on # "Lock behavior characterization of Commercial Workloads" # # Contributors: Ravi Rajwar, Mikko's group, Min Xu and us # # May 21, 2002 # ########################################################################## ## About the Conclusion ## (*) For correctness, it's not necessary to identify user-level thread switches. Our conclusion on this is too strong to be correct. ## About our Methods ## (*) Lock identification algorithm: Kevin gave an example of using one casa instruction to insert into a linked list. This can introduce errors in our current scheme, because (1) the locking and critical section are implemented by one instruction; (2) a following normal store can be mistakenly deem as a completing store for critical section. We don't know how often this happen though. (*) Ravi suggested to get the number of cycles spent on lock contention since usually access to lock induces dirty misses (100+ cycles). (*) Jichuan once thought that scheduling a waiting thread off is a good thing for throughput-oriented (fine-grained lock abound) workloads, but Ravi pointed that scheduling is not free, the overhead itself can be large (IEEE Micro had a paper on OLTP optimization issues, they actually restrain thread-yielding on IBM machine, and only switches on I/O. (*) The cache behavior of lock accesses can be a factor having performance impact: so sometimes using one coarse-grained under SLE can remove all the misses induced by many fine-grained locks. (*) To get a detailed cycle count, we probably need a detailed processor timing-model (to model the multiple-issue, miss-miss overlapping CPU). Ravi mentioned an example, when the lock contention is 9% of execution time, but using SLE only have 1% speedup, which could be caused by the long-latency-no-overlapping misses inside critical sections. (*) We may also get the number for 8 and 64 processors, to see if our 16-CPU setting stresses the lock contention. (*) Lock-free section is a confusing term, since `lock-free' already has its meaning in the community. We might just call it "the other". (*) Ravi mentioned that many DBMS vendors still use simple locking implementations for they are portable and ease-to-use. It's also noticed that most of the contentions (and resulted dirty-misses) happen in a thin system-dependent code layer (which operates on a small shared data area). [+] System optimization for OLTP workloads, Kunkel, S.; Armstrong, B.; Vitale, P. IEEE Micro , Volume: 19 Issue: 3 , May-June 1999 Page(s): 56 -64