Cluster Resource Management

Cluster Resource Management: A Scalable Approach

Ning Li

Jordan Parker

The last decade has seen an explosion in computing and in the latter half of the decade the Internet has brought these millions of computers together. With this explosion the performance of low cost personal computers has brought desktop computing closer in performance to server grade hardware. As this gap has narrowed the need for larger computing resources has been fueled by large demands on major web services. These two trends have led to a vast increase in very large clusters of commodity computers.

These new clusters with thousands of nodes have demonstrated high performance, scalability and fault tolerance thanks to the highly parallel nature of Internet workloads. As the popularity of these systems has grown it has become clear that there are needs for new resource management schemes. Significant work has been developed that makes single node resource allocation very successful, but managing many nodes has not yet reached any maturity. Much of the previous cluster resource management has depended on centralized managers, which we feel could be limiting factors in both scalability and fault tolerance for the largest clusters. Our hierarchal algorithm is able to achieve cluster wide usage ratios within 2% of our desired allocation with less than a 1% standard deviation. Beyond this more than reasonable performance our hierarchy should allow clusters to easily scale beyond a thousand nodes without management bottlenecks.

Relevant Links

Paper (html, pdf, ps, doc)

Presentation Slides (html, ppt)

Mid-semester Status Report (html, ppt)

Project Proposal (txt)

Data and Source Code (tar.gz)

The Network Simulator - ns-2