The premise of this study is to find how to best schedule sequential jobs on a cluster of workstations. To achieve this end, we compiled a suite of four types of workloads, each with a particular emphasis: CPU, Development, Interactive, and I/O. We performed a number of sensitivity tests on the workloads to establish what resource requirements they have. These tests include: run-times on different workstation models, effect of sharing the processor, effect of flushing the file cache between runs, available parallelism, the cost of remote execution, and the effect of network over-utilization.
We have found that some workloads (CPU) are quite sensitive to the particular processor model they run on, and (further compounding the problem) there is no total ordering on the set of machines. Not surprisingly, CPU-intensive workloads (CPU, Development, I/O?) suffer when sharing the CPU with another process, whereas jobs that utilize little of the CPU (Interactive) can easily share that particular resource. The effect of the file cache cleaning is noticeable in some of the workloads (especially Development and I/O), with slowdown of up to 80%. The implication of this is that when scheduling a particular user's job stream, attention to where they last ran a job might be important.
Further analysis of the workloads reveals that they have a fair
amount of available parallelism. Users who submit multiple jobs at a
time could see their throughput vastly increased. However, when
running jobs on the workstations, the scheduling policy must not be
unfair to some of the users by overcommitting them to one particular
user. The cost of remote execution on machines today was measured at
around 2 seconds. Further, we found that a value of
We designed the following four "typical" workloads for our experiments:
The following links give the results from some of our experiments:
CPU Workload
Development Workload
Interactive Workload
IO Workload
Sensitivity to Processor Model
Sensitivity to Sharing CPU
Workload Parallelism
Sensitivity to Locality
Sensitivity to Network Over-Utilization