Barrier Optimizations in Implicit Coscheduling

Michael J. Brim and Todd L. Miller

Abstract: Implicit coscheduling allows member processes in a parallel application to remain coordinated, reducing overall execution time when run on time-shared distributed systems such as networks of workstation (NOWs). The existing implementation of implicit coscheduling performs close to or better than the ideal, gang scheduling, for a variety of application classes, including bulk-synchronous, continuous-communication, and load-imbalanced applications. Still, there is one class of applications that does not benefit from the strategies of implicit coscheduling, that class being continuous-communication applications where the communication interval is small, the synchronization interval is large, and the load-imbalance within the application is large. We attribute the bad performance of this class of applications to the poor choices made by local schedulers during barrier operations, as it can be shown that synchronization time accounts for approximately 30\% of the total execution time for these applications, even though barriers are infrequent. As such, we present multiple techniques for improving barrier performance for this class of applications. First, we investigate the use of tree-based processing of barrier messages, in order to distribute messaging overhead and overlap communication. Next, we present several alternative local waiting algorithms that make use of information gathered by several methods. Finally, we describe an algorithm for explicit synchronization of processes at barriers using keep-alive messages sent from the barrier's root process. Our results indicate that most of the optimizations presented either only slightly improve performance or do not provide any significant improvement, suggesting that it is inherently difficult to predict the actions of local processes during barriers in such applications.

Paper available as: Postscript or PDF

All source code available here: Gzipped tar file