Towards Process Management on a Network of Workstations

The premise of this study is to find how to best schedule sequential jobs on a cluster of workstations. To achieve this end, we compiled a suite of four types of workloads, each with a particular emphasis: CPU, Development, Interactive, and I/O. We performed a number of sensitivity tests on the workloads to establish what resource requirements they have. These tests include: run-times on different workstation models, effect of sharing the processor, effect of flushing the file cache between runs, available parallelism, the cost of remote execution, and the effect of network over-utilization.

We have found that some workloads (CPU) are quite sensitive to the particular processor model they run on, and (further compounding the problem) there is no total ordering on the set of machines. Not surprisingly, CPU-intensive workloads (CPU, Development, I/O?) suffer when sharing the CPU with another process, whereas jobs that utilize little of the CPU (Interactive) can easily share that particular resource. The effect of the file cache cleaning is noticeable in some of the workloads (especially Development and I/O), with slowdown of up to 80%. The implication of this is that when scheduling a particular user's job stream, attention to where they last ran a job might be important.

Further analysis of the workloads reveals that they have a fair amount of available parallelism. Users who submit multiple jobs at a time could see their throughput vastly increased. However, when running jobs on the workstations, the scheduling policy must not be unfair to some of the users by overcommitting them to one particular user. The cost of remote execution on machines today was measured at around 2 seconds. Further, we found that a value of seconds was necessary to suffer from negligable slowdown. Lastly, when the network was over-utilized, some of the workloads (Development, I/O) slowed down significantly. This may suggest that the scheduling policy should try to monitor network activity (and the source of it), and perhaps alter its decisions based on current on-goings.

We designed the following four "typical" workloads for our experiments:

CPU Workload

Development Workload

Interactive Workload

IO Workload

The following links give the results from some of our experiments:

Sensitivity to Processor Model

Sensitivity to Sharing CPU

Workload Parallelism

Sensitivity to Locality

Sensitivity to Network Over-Utilization

Back to the CS258 Selection Guide

Towards Process Management on a Network of Workstations

Remzi H. Arpaci

Andrea C. Dusseau

Amin M. Vahdat