Cluster Resource Management: A Scalable Approach
Abstract
The last decade has seen an explosion in computing and
in the latter half of the decade the Internet has brought these millions of
computers together. With this
explosion the performance of low cost personal computers has brought desktop
computing closer in performance to server grade hardware.
As this gap has narrowed the need for larger computing resources has been
fueled by large demands on major web services. These two trends have led to a vast increase in very large
clusters of commodity computers.
These new clusters with thousands of nodes have
demonstrated high performance, scalability and fault tolerance thanks to the
highly parallel nature of Internet workloads.
As the popularity of these systems has grown it has become clear that
there are needs for new resource management schemes.
Significant work has been developed that makes single node resource
allocation very successful, but managing many nodes has not yet reached any
maturity. Much of the previous
cluster resource management has depended on centralized managers, which we feel
could be limiting factors in both scalability and fault tolerance for the
largest clusters. Our hierarchal
algorithm is able to achieve cluster wide usage ratios within 2% of our desired
allocation with less than a 1% standard deviation.
Beyond this more than reasonable performance our hierarchy should allow
clusters to easily scale beyond a thousand nodes without management bottlenecks.
Relevant Links
Paper (html, pdf, ps, doc)
Presentation Slides (html, ppt)
Mid-semester Status Report (html,
ppt)
Project Proposal (txt)
Data and Source Code (tar.gz)
The Network Simulator - ns-2