Cluster Resource Management: Scalable Approaches

Ning Li

Jordan Parker

Mid-semester Status Report

CS 736 – Fall 2000

Why Study Cluster Resource Management?

Clusters have become increasingly popular for large parallel computing.

Web Servers

Clusters are becoming increasingly large to the order of thousands of nodes.

Clusters are providing multiple services.

Multiple Services: Example

An Internet Service Provider is hosting many different websites for clients

How do you schedule according to the amount of bandwidth a client is paying for?

Proportional Share

Cluster Reserves

Our technique more scalable.

Overview

Introduction / Reason for Research

Related Work

Infrastructure

Evaluation

Related Work

Andrea C. Arpaci-Dusseau, David E. Culler, Alan Mainwaring, Scheduling with Implicit Information in Distributed Systems, Sigmetrics'98 Conference on the Measurement and Modeling of Computer Systems

Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Cluster-Based Scalable Network Services, Proc. 1997 Symposium on Operating Systems Principles (SOSP-16), St-Malo, France, Oct. 1997.

M. Aron, P. Druschel, and W. Zwaenepoel. Cluster reserves: A mechanism for resource management in cluster-based network servers. In Proceedings of ACM SIGMETRICS 2000, June 2000.

Waldspurger, C.A. and Weihl, W.E., Lottery Scheduling: Flexible Proportional-Share Resource Mangement, Proceedings of the First Symposium on Operating Systems Design and Implementation, Monterey CA, November 1994, pp. 1-11.

NS – Network Simulator Manual, http://www.isi.edu/nsnam/ns/ns-documentation.html.

What make us different?

Goal: to provide a scalable solution for resource management.

Other papers focused primarily on just having good management

This often meant 1 manager for all the nodes.

Clearly this could present a scalable bottleneck

Effectiveness: Other solutions probably better for smaller clusters, we hope to be better for large (>1000 nodes) clusters.

The Management Scheme

Cluster Reserves with multiple managers

Mainly a comparison

A new Lottery like algorithm (Banks)

A hierarchal management network

Infrastructure

The Hierachal Algorithms

Use NS to simulate our algorithms

Hierarchal View

A Problem and a Solution

Problem: not scalable
Solution: Hierarchy! + Fault Tolerance
(a nice little example, perhaps with 2 level managers)

Approach 1:

modify "Cluster Reserves" optimization algorithm

use it when manager manages nodes

AND when level_n+1 manager manages level_n managers.

Approach 2:

introduce bank account mechanism

use bank algorithm for manager managing nodes

use transfer strategy for level_n+1 manager managing level_n managers

Problem Specification:

N: # of nodes in a cluster

S: # of service classes

T: a vector of N elements, T_i: resource (# of tickets) on node I

T_total: total resource in cluster (not in "cluster" paper)

r and u: NxS matrices, r_ij and u_ij: the percentage resource allocation and resource usage, respectively, at node i for service class j.

D: a vector of S elements, D_j: the desired percentage resource allocation for service class j over the cluster.

Input: r and u and the vector T and D

Output: a NxS matrix R, R_ij: the new percentage resource allocation for service class j on node i.

Solution Step 1:

Compute the least feasible deviation between desired and actual allocations.
              S | N                        |
    Minimize sum|sum R_ij*T_i - T_total*D_j| (1)
             j=l|i=1                       |

Resource allocations on any cluster-node should sum to no more than 100.

                          S
     for any i in 1..N, sum R_ij <= 100
                         j=1

On any node, new allocation should be no more than the usage if the node is not a resource sink, i.e. if previous allocation exceeds the usage.
     for any i,j R_ij <= u_ij if r_ij > u_ij

Solution Step 2:

Compute the new resource allocations s.t.

the deviation computed in the first step is achieved, and

the computed resource allocations are close to the ideal allocation (D) (different from paper, to see which is better)
               N   S
     Minimize sum sum(R_ij - D_j)^2 (2)
              i=l j=1

A New Idea/Addition

Distribute unassigned cluster resource to service classes who need it

Since manager has the knowledge of when and how much resource a service class contributed before, it can give appropriate priorities to those classes when assigning unused resource.

Approach 2: Bank Account Mechanism

Each manager has a bank.

Each bank has an account for each service class.

In the account is the # of tickets saved and when they are deposited.

Depositing, drawing, and transferring tickets together are used to achieve both performance isolation and resource utilization.

Bank Algorithm: part 1

Checking each service class j on each node i:
compare previous ticket usage u_ij, allocation r_ij and desired allocation D_j

1 u_ij < r_ij and r_ij <= D_j:    R_ij = u_ij
    deposit D_j - R_ij to its back account

2 u_ij < r_ij and r_ij > D_j:     R_ij = min(u_ij,D_j)
    deposit D_j - R_ij to its bank account if it's greater than 0

3 u_ij = r_ij and r_ij < D_j:     R_ij = D_j
                                    (or R_ij = u_ij + k,

where k is a small #)

4 u_ij = r_ij and r_ij >= D_j:    R_ij = D_j

Bank Algorithm: part 2

let t_i be # of tickets currently allocated on node i
IF t_i >= T_i
normalize the tickets so that t_i = T_i

ELSE
check balance B_ij in bank account for class j in case 4 above

Bank Algorithm: part 2 (continued)

option 1: check classes in decreasing balance order
      let b_ij = min(B_ij, h), where h is a   relatively small #
        R_ij += b_ij, and draw b_ij from j's bank account
        t_i += b_ij
      until t_i >= T_i

option 2: check all classes in case 4 above with balance >= 0
allocate T_i - t_i tickets to these classes proportional to their bank account, and draw from bank account accordingly

Bank Algorithm: part 3

assign to classes in case 4 above proportional to their share or their need if there are still unassigned tickets.

Notes and Other Strategies:

Note: Tickets in bank account has a time-stamp associated with it, and will expire after getting certain age.

Strategy: Manager could force some compensation if t_i >= T_i on all the nodes before adjustment, and some classes have high balance in their accounts. Manager could allocate a reasonable amount of tickets as in option 2 above, then normalize so that t_I gets equal to T_i.

Strategy: Some class on some node may choose to reserve some tickets for its use on this same node in the near future, but not deposit them in the bank. We'll check this option.

Transfer Strategy: Very simple

Based on the previous usage report from lower-level managers, current manager transfers from one account to another where tickets are badly needed.

Transfer Strategy:
More detailed (if needed)

check class-manager <j,i> pair in decreasing usage/share order, i.e.

check those classes that need more tickets most

check j's account on other managers l, where usage/share is low

transfer min(B_lj,b) tickets from j's acccount on manager l

to j's account on manager i, where b is a constant

Thinking of better strategies. :-)

Any Ideas

Network View

Full Network Overview

Failure Design

Essentially tried to create a structure similar to a tree structure

Thus we try to delete nodes and deal with the recovery similar to removing a node from a tree

Minor Node(6) Failure

1^st Level Manger(2) Failure

2^nd Level Manger(1) Failure

Node Insertion

Simply find a manager with nodes to fill

If there is no space simply make a leaf node into a manager

Why discuss failure?

Not relevant to the performance of our scheduler, we don’t even plan to simulate it (unless we have lots of free time), but …

It does show that the network layout we’ve designed could easily handle failures

Making the tree balance itself and handling failures could be relatively straight forward

Network Simulator - NS

Our Components

A new Agent Class: RsrcAgent

Agents are servers running on a node

A script to create ns input file

Specifies network layout

Number of Nodes

Nodes per Manager

Specifies the request trace

NS implementation status

Look at code

Evaluation

NS should make it easy

Just extract information from nodes about load balance

More importantly look at the rate queries get handled by the nodes


	Ning Li
	Jordan Parker
	Mid-semester Status Report
	CS 736 – Fall 2000


	Clusters have become increasingly popular for large parallel computing.
		Web Servers
	Clusters are becoming increasingly large to the order of thousands of nodes.
	Clusters are providing multiple services.


An Internet Service Provider is hosting many different websites for clients
	How do you schedule according to the amount of bandwidth a client is paying for?
		Proportional Share
		Cluster Reserves
		Our technique more scalable.


	Introduction / Reason for Research
	Related Work
	Infrastructure
	Evaluation


	Andrea C. Arpaci-Dusseau, David E. Culler, Alan Mainwaring, Scheduling with Implicit Information in Distributed Systems, Sigmetrics'98 Conference on the Measurement and Modeling of Computer Systems
	Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Cluster-Based Scalable Network Services, Proc. 1997 Symposium on Operating Systems Principles (SOSP-16), St-Malo, France, Oct. 1997.
	M. Aron, P. Druschel, and W. Zwaenepoel. Cluster reserves: A mechanism for resource management in cluster-based network servers. In Proceedings of ACM SIGMETRICS 2000, June 2000.
	Waldspurger, C.A. and Weihl, W.E., Lottery Scheduling: Flexible Proportional-Share Resource Mangement, Proceedings of the First Symposium on Operating Systems Design and Implementation, Monterey CA, November 1994, pp. 1-11.
	NS – Network Simulator Manual, http://www.isi.edu/nsnam/ns/ns-documentation.html.


	Goal: to provide a scalable solution for resource management.
	Other papers focused primarily on just having good management
		This often meant 1 manager for all the nodes.
		Clearly this could present a scalable bottleneck
	Effectiveness: Other solutions probably better for smaller clusters, we hope to be better for large (>1000 nodes) clusters.


	Cluster Reserves with multiple managers
		Mainly a comparison
	A new Lottery like algorithm (Banks)
	A hierarchal management network


	Problem: not scalable Solution: Hierarchy! + Fault Tolerance (a nice little example, perhaps with 2 level managers)


	modify "Cluster Reserves" optimization algorithm
		use it when manager manages nodes
		AND when level_n+1 manager manages level_n managers.


	introduce bank account mechanism
		use bank algorithm for manager managing nodes
		use transfer strategy for level_n+1 manager managing level_n managers


	N: # of nodes in a cluster
	S: # of service classes
	T: a vector of N elements, T_i: resource (# of tickets) on node I
	T_total: total resource in cluster (not in "cluster" paper)
	r and u: NxS matrices, r_ij and u_ij: the percentage resource allocation and resource usage, respectively, at node i for service class j.
	D: a vector of S elements, D_j: the desired percentage resource allocation for service class j over the cluster.
	Input: r and u and the vector T and D
	Output: a NxS matrix R, R_ij: the new percentage resource allocation for service class j on node i.


	Compute the least feasible deviation between desired and actual allocations. S \| N \| Minimize sum\|sum R_ijT_i - T_totalD_j\| (1) j=l\|i=1 \|
	Resource allocations on any cluster-node should sum to no more than 100.
	S for any i in 1..N, sum R_ij <= 100 j=1
	On any node, new allocation should be no more than the usage if the node is not a resource sink, i.e. if previous allocation exceeds the usage. for any i,j R_ij <= u_ij if r_ij > u_ij


	Compute the new resource allocations s.t.
	the deviation computed in the first step is achieved, and
	the computed resource allocations are close to the ideal allocation (D) (different from paper, to see which is better) N S Minimize sum sum(R_ij - D_j)^2 (2) i=l j=1


	Distribute unassigned cluster resource to service classes who need it
	Since manager has the knowledge of when and how much resource a service class contributed before, it can give appropriate priorities to those classes when assigning unused resource.


	Each manager has a bank.
	Each bank has an account for each service class.
	In the account is the # of tickets saved and when they are deposited.
	Depositing, drawing, and transferring tickets together are used to achieve both performance isolation and resource utilization.


	Checking each service class j on each node i: compare previous ticket usage u_ij, allocation r_ij and desired allocation D_j 1 u_ij < r_ij and r_ij <= D_j: R_ij = u_ij deposit D_j - R_ij to its back account 2 u_ij < r_ij and r_ij > D_j: R_ij = min(u_ij,D_j) deposit D_j - R_ij to its bank account if it's greater than 0 3 u_ij = r_ij and r_ij < D_j: R_ij = D_j (or R_ij = u_ij + k,
	where k is a small #) 4 u_ij = r_ij and r_ij >= D_j: R_ij = D_j


	let t_i be # of tickets currently allocated on node i IF t_i >= T_i normalize the tickets so that t_i = T_i ELSE check balance B_ij in bank account for class j in case 4 above


	option 1: check classes in decreasing balance order let b_ij = min(B_ij, h), where h is a relatively small # R_ij += b_ij, and draw b_ij from j's bank account t_i += b_ij until t_i >= T_i
	option 2: check all classes in case 4 above with balance >= 0 allocate T_i - t_i tickets to these classes proportional to their bank account, and draw from bank account accordingly


	assign to classes in case 4 above proportional to their share or their need if there are still unassigned tickets.


	Note: Tickets in bank account has a time-stamp associated with it, and will expire after getting certain age.
	Strategy: Manager could force some compensation if t_i >= T_i on all the nodes before adjustment, and some classes have high balance in their accounts. Manager could allocate a reasonable amount of tickets as in option 2 above, then normalize so that t_I gets equal to T_i.
	Strategy: Some class on some node may choose to reserve some tickets for its use on this same node in the near future, but not deposit them in the bank. We'll check this option.


	Based on the previous usage report from lower-level managers, current manager transfers from one account to another where tickets are badly needed.



	check class-manager <j,i> pair in decreasing usage/share order, i.e.

	check those classes that need more tickets most

	check j's account on other managers l, where usage/share is low
	transfer min(B_lj,b) tickets from j's acccount on manager l
	to j's account on manager i, where b is a constant


	Essentially tried to create a structure similar to a tree structure
	Thus we try to delete nodes and deal with the recovery similar to removing a node from a tree


	Simply find a manager with nodes to fill
	If there is no space simply make a leaf node into a manager


	Not relevant to the performance of our scheduler, we don’t even plan to simulate it (unless we have lots of free time), but …
	It does show that the network layout we’ve designed could easily handle failures
	Making the tree balance itself and handling failures could be relatively straight forward


Our Components
	A new Agent Class: RsrcAgent
		Agents are servers running on a node
	A script to create ns input file
		Specifies network layout
			Number of Nodes
			Nodes per Manager
		Specifies the request trace


	NS should make it easy
	Just extract information from nodes about load balance
	More importantly look at the rate queries get handled by the nodes