Barrier Optimizations in Implicit Coscheduling
Michael J. Brim and Todd L. Miller
Implicit coscheduling allows member processes in a parallel application to
remain coordinated, reducing overall execution time when run on time-shared
distributed systems such as networks of workstation (NOWs). The existing
implementation of implicit coscheduling performs close to or better than
the ideal, gang scheduling, for a variety of application classes, including
bulk-synchronous, continuous-communication, and load-imbalanced applications.
Still, there is one class of applications that does not benefit from the
strategies of implicit coscheduling, that class being continuous-communication
applications where the communication interval is small, the synchronization
interval is large, and the load-imbalance within the application is large.
We attribute the bad performance of this class of applications to the poor
choices made by local schedulers during barrier operations, as it can be shown
that synchronization time accounts for approximately 30\% of the total
execution time for these applications, even though barriers are infrequent.
As such, we present multiple techniques for improving barrier performance
for this class of applications. First, we investigate the use of tree-based
processing of barrier messages, in order to distribute messaging overhead
and overlap communication. Next, we present several alternative local waiting
algorithms that make use of information gathered by several methods. Finally,
we describe an algorithm for explicit synchronization of processes at barriers
using keep-alive messages sent from the barrier's root process. Our results
indicate that most of the optimizations presented either only slightly improve
performance or do not provide any significant improvement, suggesting that
it is inherently difficult to predict the actions of local processes during
barriers in such applications.
Paper available as: Postscript
All source code available here: Gzipped tar file