Cluster Resource Management:
A Scalable Approach
Ning Li and Jordan Parker
CS 736 Class Project

Outline
Introduction
A Scalable Approach: Hierarchy
Results
Conclusions
Questions

Why Study Resource Management?
Clusters have become increasingly popular for large parallel computing.
Web Servers
Clusters are becoming increasingly large to the order of thousands of nodes.
Clusters are providing multiple services.
Hard to evaluate
Bad is easy to determine
Good is much harder

Resource Management Example
4th Node Services only B
Poor Management
Ideal

Clustering Goals
Scalability
Reliability
High Performance
Affordability

Related Work
Proportional-Share
Cluster Reserves

Related Work: Approach Differences
Our Goal: to provide a scalable solution for resource management.
Other work focused primarily on just having good management
This often meant 1 manager for all the nodes
Clearly this could present a scalable bottleneck
Effectiveness: Other solutions probably better for smaller clusters, we hope to be better for large (>1000 nodes) clusters.

Outline
Introduction
A Scalable Approach: Hierarchy
Results
Conclusions
Questions

Hierarchy: A Scalable Approach
Hierarchical Management
Nodes service jobs
Managers facilitate resource management

Banking Algorithm
Goal
Determine best allocation given previous usage
Primitives
Tickets
Bank accounts
Deposit / withdraw tickets
6 Steps

Banking Algorithm
Step 1: For each service class on each node
Deposit unused tickets
Step 2: For each service class on each node
Reallocate service class
Full utilization: Allocation = usage + k
Under utilization: Allocation = usage - k

Banking Algorithm Cont.
Step 3: For each service class
Compare total allocation to desired
Subtract from over-allocated
Add to needy & under-allocated
Step 4: For each service class
Deposit / Withdraw
If still over-allocated withdraw
If still under-allocated deposit

Banking Algorithm Cont.
Step 5:
Withdraw and allocate
Reward the needy nodes
Step 6:
Done, clear the bank accounts

Reliability
Bottom-up Manager Replacement

Outline
Introduction
A Scalable Approach: Hierarchy
Results
Conclusions
Questions

Results

Implementation Details
Simulations via The NS – Network Simulator
Low bandwidth 10Mbs communication network
UDP for lower server overhead
Assumptions
Node level resource management works ideally

Test 1: Overview
4 nodes – 3 services – 60/30/10 Allocation
4th node receives all of 3rd class’s requests
Steady Workload

Test 1: Data

Test 2: Overview
100 nodes – 3 services – 60/30/10 Allocation
nodes 1-30 receive all of 3rd class’s requests
Steady Workload

Test 2: Data

Test 3: Overview
100 nodes – 3 services – 60/30/10 Allocation
nodes 1-30 receive all of 3rd class’s requests
Dynamic Workload

Test 3: Data

Test 4: Overview
100 nodes – 3 services – 60/30/10 Allocation
nodes 1-30 receive all of 3rd class’s requests
Steady Workload
Reporting 1/5
Nodes every 0.3 second
Managers every 1.5 seconds

Test 4: Data

Test 5: Overview
900 nodes – 3 services – 60/30/10 Allocation
nodes 1-300 receive all of 3rd class’s requests
Steady Workload

Test 5: Data

Outline
Introduction
A Scalable Approach: Hierarchy
Results
Conclusions
Questions

Conclusions
Benefits of an hierarchy
Scalable
Reliable
Geographic Applications
Implemented a new management scheme: Banking
Comparable Results
Improved Scalability

Conclusions
Clusters are sensitive to small policy changes
Clusters are built for specific workloads
Their performance is important and small changes have significant impact
No scheme is universally applicable
Future Work
Real system implementation
Real Workloads
Real node level resource management
More steady performance

Outline
Introduction
A Scalable Approach: Hierarchy
Results
Conclusions
Questions

Questions

Related Work: Proportional-Share
Stride Scheduling
Ticket based and similar to lottery
Scale
Randomly query k nodes to find best allocation
Different Application
Condor-like resource allocation/applications

Related Work: Cluster Reserves
Resource Container Schedulers
Constrained Optimization Algorithm
Scale
Centralized single manager

Hierarchical Cluster Reserves – Version 1
Modify Cluster Reserves optimization algorithm
Use it when manager manages nodes
AND when level_n+1 manager manages level_n managers.

Hierarchical Cluster Reserves – Version 2
Cluster Reserves optimization algorithm
Use it when manager manages nodes
Don’t use it for upper level managers
Modify the manager to manager reporting
Lie to the algorithm