How to use the COW


Table of Contents


What is the COW?

There are 40 sun 20/51 workstations which form our cheap testbed for parallel computing.

All of the workstations are connected via a 1 Mbyte/sec ethernet.

4 of the 40 workstations are used as "compilation" hosts, where solaris software can be compiled and tested. These machines are named after cuts of meat (flank, sirloin, strip, filet).

The remaining 36 workstations are in used to do "work". Individual workstations are allocated for groups to use. Individuals within a group use machines allocated for their group.

Communication

One part of the COW are the individual nodes. The other part of a parallel machine is the high-speed communications infrastructure which allows data to move from one node to another.

The original plan was to interconnect the COW nodes via ATM. That never happened. Needing some interconnect for their work, the Wind Tunnel project bought Myrinet interface cards and switches.

Historically, the following problems have limited the overall connectivity of the myrinet network at our site.

  1. software incompatibility between the Illinois "Fast Message" system and the Myrinet softeare
  2. version 1.0 myrinet doesn't route ip packets across >1 switches (we have upgraded to 2.X, which does allow this)

Fortunately, that may be problems may be eliminated soon, as the Wind Tunnel is switching to the Berkeley Messaging system, which can operate alongside normal myrinet traffic.

Node Names

The individual "compute" nodes are named 'cowe00' throug 'cowe35' The 'cow' part indicates that the machine is part of the cow. The 'e' indicates that the name belongs to the Ethernet interface of that node. The two digits with leading 0s identifies individual cows. To communicate via a particular interconnect, use the form of the hostname with the 'interface' character changed to the desired interface; 'e' for ethernet, 'm' for IP over myrinet, and 'a' for ATM (if we ever get it).

Myrinet Partitions

To allow the various groups using the cow to cow-exist without slaughtering each other, we have partitioned the myrinet hardare into several independent networks.

Cow Software Functionality

After a long discussion, it was discovered that there is almost no common software interests among the many groups using the cow.

However, all these groups still need to be able to use the cow alongside each other.

To facilitate this a set of fundamental utilities were created. These utilities provide for selecting groups of nodes, configuring those nodes, and for reserving nodes.

The Partition Manager

The partition manager allocates nodes into named units which may be used to refer to those nodes as a whole. It is similar to a filesystem in concept, which allocates and names units of blocks into a file, and files into directories.

For example: create a partition from nodes 1..4, and call it shylock.

Reservation Manager

The Reservation Manager is a facility which allows users of the COW to schedule use of the nodes among themselves. Arbitration between requests may be performed either by algorithmic means, or punted to human intervention.

The reservation manager may reserve more than just nodes; it may also optimize requests onto underlying communications infrastracture, and special hardware installed on a per-node basis.

For example:

  1. I need 10 nodes tomorrow from 0600 to 1800.
  2. I need 5 myrinet nodes for the rest of the day.

Batch Manager

The Batch Manager lets users executes jobs on the COW automagically. Batch jobs will be run in a first-come first-served manner, give or take available nodes on the COW.

For example: run ray_tracer on 5 nodes.

Configuration Manager

[ This section left not quite blank ]

Cow Software Systems

These software systems implement the facilities described in the previous section.

Getcube

The cube mangler is an implementation of the partition manager functionality. It consists of two pieces...
  1. A daemon, named cube_mgr, which is started from the COW init.d script. This daemon reads and maintains the current configuration of partitions from a text database in /usr/adm/cow/config.
  2. A set of utilities which communicate with the daemon to examine and change the state of the partition database. These utilities live in /p/cow/bin, and all have the tag cube in their names.

Commands

In the following commands, if no [partition] is specified, the root partition is used. The '.' character is used as the path seperator in partition names.

lscube [partition]
lists any sub-partitions of the partition.
showcube [partition]
display information about the nodes which comprise a a particular partition.
getcube [-l list,of,nodes | -n number_of_nodes] partition-name
Create a new partition with the given name. The options which specify the construction of the new partition are
-l list,of,nodes
a comma-seperated list of node numbers from the enclosing partition. For example, if the enclosing partition has 8 nodes, the numbers 0-7 are valid node numbers
-n number_of_nodes
is followed by the number of nodes the new partition should contain.
-m mode
A Unix-like protection system is currently used to control access to partitions and nodes. This specified which mode the partition should be created with. There is currently no method of changing the mode of a partition after it is created.
rmcube partition.name
removes the named partition, freeing the nodes to be used by someone else.

Cow Hostler

The cow hostler provides for the configuration of individual nodes and the execution of programs on multiple nodes. The hostler also has multiple components ...
  1. A server, cud.pl started on demand from inetd.
  2. A configurator, pconfig which allows the setup of a pre-arranged configuration on a set of nodes.
  3. A program starter, prun which executes programs on multiple nodes in parallel.
The various hostler components may be found in the cow pen, /p/cow/pen/bin.

The pre-arranged configurations are shell scripts stored in /p/cow/pen/rbin with support files in /p/cow/pen/rlib.

Commands

pconfig -c <configuration> -p <partition>
Add and Configure <configuration> onto the nodes in <partitiion>. A configuration will only be added to a node once. Specifying it a second time will do nothing.
pconfig -C <configuration> -p <partition>
Delete all configurations added to nodes in <partition> until <configuration> is deleted.

Configurations

The following configurations are available.
myrinet/ip
Enable IP packets to flow across the myrinet.
misc/*
These configurations are random one-shots which bolo uses for various things. They really aren't designed to be used by anyone else.
mlanai-driver
myri-driver
myri-tcp
public-net
Most of these configurations no longer have any use. They should be avoided.

Distributed Job Manager

The Distributed Job Manager, DJM is a batch scheduling system designed for users who don't care to run jobs interactively, or who want to schedule a large number of jobs which can be run unattended. DJM originated as software used on the cm-5.

The person who knew anything about DJM has left the UW, currently we are clueless as to what it actually does.

DJM is currently used for the following COW functions

  1. Reservations
  2. Batch Scheduler
  3. Program Executor (for batch jobs)

DJM is also structured in multiple pieces

  1. A daemon, cow_starter, started from the COW init.d script. I believe this part may execute batch jobs.
  2. Another daemon, cow_master, started from the cow_starter daemon. I think this part is the combination reservation and batch scheduler.
  3. A set of utilities to communicate with the daemons. These are located in /p/cow/bin.

Commands

The cryptic documentation (as if this isn't cryptic already :-) for creserve and cfree and cstat may be found in /p/cow/user-info/creserve.man.
creserve
Request a set of nodes to use.
cstat
Display the status of the nodes and reservations for them.
cfree
Release a reservation.

Example use of ShoreComm

It is of no small coincidence that the name of the "hypothetical user" in the following is markos. He helped me write the original version of this document.

Finding some nodes to use

In general, the partition commands shouldn't be used to create or release "top level" partitions (aka those in the "root" directory). DJM controls this level of the namespace, and all requests for top level partitions must be granted from DJM. The creserve and cfree commands are used to perform these actions.

For example, say a hypothetical user, 'markos' wanted to use some of the 16 nodes. He wants them RIGHT NOW!!!! For the rest of the evening, and doesn't want DJM mucking around on them ... (ps the creserve and cfree commands are in /p/cow/bin)

cow% creserve -nodes 4 -mode nosched -until 11pm
Reservation #8, has 4 nodes (4-7), from now until Jun 21 23:00.
This reserves a 4 node partition for markos from NOW until 11pm today. DJM will not schedule batch jobs in this partition, so markos is free to use it himself. The partition will be in the root of the partition namespace, and will default to the user's name, 'markos' in this case. A explicit partition name may be specified, see the creserve documentation. The "reservation number" is used to identify the partition when using DJM commands. If markos finishes his work early (haha! :-) he can release the nodes he is using with cfree:
cow% cfree 8
Reservation #8 deleted.
If your reservation conflicts with another, you can use the 'cstat' command to view registered reservations.
cow% cstat res
Running scheduled partitions:
PART        BNODE NODES LOAD
*** None ***

Currently reserved:
 # PART       NODES NP USER     GROUP    PRM MODE    UNTIL
 4 -nopart-   16-23  8 bolo     bolo     754 nosched Indefinite  
 6 -nopart-   24-27  4 swartz   swartz   755 nosched Indefinite  
 5 wwt-pen    28-35  8 swartz   swartz   777 nosched Indefinite  

Currently Free Nodes: 0 - 15

Partitioning Nodes

If markos wanted to use a subset of his nodes, he could now use getcube to partition his nodes into whatever subsets he wanted to used. The configuration software configures entire partitions, so if markos wanted to have some of his nodes configured differently from the rest, he would need to place them in their own partition.

Markos would like to test a shore client and server; he only needs to use 2 of his 4 nodes to do this. He could configure all of them to use ip over myrinet, but I'll use this as a place to demonstrate getcube.

cow% getcube -n2 .markos.tcp
getcube 'markos.tcp' ok
Creates a 2 node partition which we will configure for IP over myrinet.

You can use 'showcube' to find what nodes you actually are using.

cow% showcube markos.tcp
name	.markos.tcp
numnodes	2
user	2411
group	2411
mode	0775
0	cowe04	cowe04	cowa04	cannex1/5005	1
1	cowe05	cowe05	cowa05	cannex1/5006	1
showcube 'markos.tcp' ok

Configuration

By default, the nodes of the cow are setup as generic unix systems. The default "mode" of the myrinet (mention myrinet "modes") is to communicate using the the myrinet API. Other communication options, such as IP over Myrinet, and Illinois "Fast Message" networking are available. Other configuration options exist too; a list of them can be found by 'ls /p/cow/pen/rbin'. If you don't know what a particular configuration does, you prrobably don't want to use it. Some of these configuration options are mutually exclusive, some arent; there is no way to tell a-priori.

The configuration or package name for IP over myrinet is called "myri-tcp'. To configure a partition, use the "pconfig" command (which is found in /p/cow/pen/bin)

cow% pconfig -c myri-tcp -p markos.tcp
This configures IP over Myrinet on the 2 nodes which markos specified. To remove a configuration, the '-C' option is used instead of '-c'.

There is a command called 'prun' which allows you to run a command on all or some of the nodes in a partition. I don't find it particularily useful for what I do, so someone else can write documentation about it!

Shore Comm itself

Start an xterm and rlogin to the node(s) you will be using. Markos is using shore comm, so he will need to setup some environment variables to make shore comm communicate over the myrinet instead of the ethernet.
cowe04% setenv OCOMM_TCP 'cowm04/any'
cowe05% setenv OCOMM_TCP 'cowm05/any'
cowe0X% cd shore/src/object_comm/ns
cowe04% ./ns .ns
cowe05% ./query -f .ns
ns> enter HI-MOM! 
ns> shutdown
ns> ^D
cowe0X% exit

How to undo all of this?

cow% pconfig -C myri-tcp -p markos.tcp
cow% rmcube markos.tcp
cow% cfree 8
and we're done.
COW Information
Bolo's Home Page
Last Modified: Mon Oct 16 14:10:45 CDT 1995
bolo (Josef Burger) <bolo@cs.wisc.edu>