Kangaroo Technical Manual

Version 0.6, 9-October-2001
This manual may be out of date. Please check the Kangaroo Web Page for the most recent version.
Kangaroo is Copyright (C) 2001 Douglas Thain. Permission is granted to use the source code for research and study purposes. Other uses must be cleared with the copyright holder.

Table of Contents

Overview

Kangaroo is a simple data movement system.

A network of Kangaroo servers can provide fault-tolerant, high-throughput data movement services for arbitrary applications. Kangaroo accepts data from an application and then does whatever it takes to get the data delivered to its destination.

Kangaroo is not a high-performance file transfer mechanism. If your goal is to move small files with the least latency possible, then Kangaroo is probably not for you.

However, Kangaroo can provide higher throughput for applications that alternate CPU and I/O bursts. Kangaroo works in the background, so applications are not held hostage to network failures and performance variations.

Getting Started

Supported Platforms

Kangaroo ought to run on most any POSIX-compliant systems. It has been known build and run without problems on the following systems: We expect it will compile on any other POSIX-like system, perhaps with a few bug fixes. We're happy to accept bug (and success) reports from the field.

Installation

First, download a Kangaroo distribution from the Kangaroo home page. You may download either a source package or a binary distribution. Please skip to the relevant section.

Source Installation

First, unpack the tarball in a scratch directory.
% gunzip kangaroo.tar.gz
% tar xvf kangaroo.tar
Now, decide on a place to install Kangaroo. I'll use /home/fred/kangaroo as an example. Run configure with this directory given as an option, then make and make install.
% cd kangaroo
% ./configure --prefix /home/fred/kangaroo
% make
% make install
That's all! Please skip to "Runtime Setup."

Binary Installation

Simply unpack the tarball in any directory you like. I'll use /home/fred/kangaroo as an example.
% cd /home/fred
% gunzip kangaroo-xxx-yyy.tar.gz
% tar xvf kangaroo-xxx-yyy.tar
You're done!

Runtime Setup

Before running any programs, you should set some variables that will direct Kangaroo and your shell to the relevant directories. If you are using a C-like shell:
% setenv KANGAROO_INSTALL_DIR /home/fred/kangaroo
% setenv PATH ${KANGAROO_INSTALL_DIR}/bin:${PATH}

Next, edit the authorized users file to determine what nodes will be allowed to participate:

% vi ${KANGAROO_INSTALL_DIR}/etc/kangaroo.authusers
In this file, you should list all the hosts running Kangaroo. Here is an example:
localhost
*.cs.university.com
laptop.wireless.net

A more fine-grained security mechanism is available if you are using the Globus tools. See the security section below.

Finally, on every participating node, start the kangaroo server:

% kangaroo_server

Examples

This section will assume that you are sitting at a host called here.cs.wisc.edu and are running Kangaroo servers on two hosts here.cs.wisc.edu and there.cs.wisc.edu.

The Kangaroo Client

The Kangaroo client program is called simply kangaroo. To interact with the Kangaroo network, just start the client with the name of any Kangaroo server as an argument. By default, the client will connect to the local host.
% kangaroo
connected to nearest server: 127.0.0.1
 kangaroo> 
The program is similar to an FTP client. You use commands like put and get to send and retrieve files from remote servers. However, the Kangaroo client is different from FTP in one very important way: you may interact with any Kangaroo host, merely using the nearest server as a proxy. For example, to get a file from there, you must give the host name, the remote file, and the local file:
 kangaroo> get there.cs.wisc.edu /etc/hosts /tmp/hosts
To queue a file to be sent, give the local file name, the remote host, and the remote file:
 kangaroo> put /etc/hosts there.cs.wisc.edu /tmp/output
If you wish to wait until all the data are sent, then use the push command:
 kangaroo> push there.cs.wisc.edu /tmp/output
On the other hand, if you simply wish to exit without waiting for the file to be sent, then commit your changes and exit the client:
 kangaroo> commit
 kangaroo> quit
To run Kangaroo commands from a shell script, simply use there 'here file' notation in sh or csh. Don't forget to terminate any sequence of events with a commit or a push.
#!/bin/sh
kangaroo << EOF
put /etc/hosts there.cs.wisc.edu /tmp/output
commit
EOF

The Status Report

A web browser can be used to check on the status of a Kangaroo node and watch the progress of data in motion. Simply start a web browser and connect to port 9096 on a host running a Kangaroo server. (This is usually specified like so: http://there.cs.wisc.edu:9096. From there, you may view the data in progress by destination host, destination file, or even by individual messages. If the server is unable to move the data, you will also see error messages. For example:

here.cs.wisc.edu kangaroo server

kangaroo version 0.3 Aug 6 2001 09:42:29 Web Page, Manual, Operator

Kangaroo is stopped: couldn't send 'put /tmp/output' to www.xxx.yyy.zzz because Connection refused
View by: Host File Message

Queued Messages:

DestinationFileMessageNumberSize
www.xxx.yyy.zzz/tmp/outputput193.8 KB
Total:00
In this case, the sender cannot contact the receiver. The receiving host is refusing connections because someone (the author) forgot to to start the server. Oops! After starting a server at the destination, the new, happier report is:

coral.cs.wisc.edu kangaroo server

kangaroo version 0.3 Aug 6 2001 09:42:29 Web Page, Manual, Operator

Kangaroo is running normally.
View by: Host File Message

Queued Messages:

DestinationFileMessageNumberSize
Total:00

The Email Warning

Kangaroo is very persistent in moving data. No simple failure -- such a broken network, a crashed host, or a missing directory -- will cause your data to be lost. If Kangaroo is unable to move data, it will pause briefly and try again later. Initially, it pauses five seconds. Subsequent failures increase the pause time exponentially up to a minute. If Kangaroo is stopped by any condition lasting more than five minutes, it will notify the server's owner by sending email. To avoid overwhelming you with mail, Kangaroo will not notify you twice for the same problem, nor will it notify you more than once in four hours. For example, here is an email warning generated by the same problem described above:
Date: Mon, 6 Aug 2001 10:38:34 -0500 (CDT)
From: Douglas Thain 
To: thain
Subject: [Kangaroo] problem

This is a recorded message from the Kangaroo server at www.xxx.yyy.zzz.
I am currently unable to move data:
        couldn't send 'put /tmp/output' to www.xxx.yyy.zzz because Connection refused

I will not send you any more notices before Mon Aug  6 14:38:34 2001

kangaroo version 0.3 Aug  6 2001 09:42:29
For more information about Kangaroo, see http://www.cs.wisc.edu/condor/kangaroo

Kangaroo via PFS

The Kangaroo client tool tool is useful for moving whole files here and there, but it is clumsy to use. A more natural interface can be applied by using the Pluggable File System. (You'll have to download and install it separately.) With PFS installed, then you may simply access Kangaroo as a file system, like so:
% pfsrun tcsh
% vi /kangaroo/there/tmp/data
% gcc test.c -o /kangaroo/there/home/fred/test.exe
PFS transparently converts an application's UNIX operations into the corresponding Kangaroo operations. Administrative commands such as push and route must still be accessed from the Kangaroo client in the usual way.

Options and Miscellena

This sections describes some of the more arcane features of Kangaroo. If this is your first trip through the manual, I recommend that you stop here and try your hand at the examples.

Security

Kangaroo provides a simple security model which authenticates one host or one process to another. If multiple users of a single server don't trust each other, then we can't help you there.

Security concerns can be split up into authentication and authorization. The former is concerned with finding out who is calling. The latter is concerned with determining whether the caller, once known, has permission to proceed. Kangaroo has two authentication mechanisms and one authorization mechanism.

The two authentication mechanisms are trivial and Globus. Trivial authentication simply looks up the DNS name of the calling host. Globus authentication uses public key encryption to determine the X.509 name of the calling process. You may use either or both to limit the clients that can contact a particular server. The list of accepted clients is given in the "authorized users" file, which is $KANGAROO_INSTALL_DIR/etc/kangaroo.authusers. You should edit this file to contain the host names or X.509 subjects you are willing to admit. For example, if you are willing to accept connections from the local machine, any machine in your department, and Fred from any other machine, your authorized users file might look like this:

localhost
*.construction.bedrock.gov
/C=US/O=Bedrock Township/OU=Construction Services/CN=Fred Flintstone
Note 1:The only hitch to using Globus authentication is that every participating process must be running a grid proxy. Before running any clients such as kangaroo_status, you must run grid-proxy-init and enter your passphrase. You must also do the same for every host running a server!

Note 2: Security applies to server-server connections as well as client-server connections. If you are using kangaroo_put at host A to send data to host B, then host A and B must be listed in each other's authorized users file.

Note 3: Debugging security problems can be a real hassle. If things don't seem to be working right, try running the servers and tools with the -d option. Messages beginning with auth can give some hints into what is going on. For example, if a proxy has expired or was never loaded, you will see:

auth_globus: couldn't load my credentials: did you run grid-proxy-init?

Routing

A complicated Kangaroo system needs a routing file in order to direct traffic where it needs to go. The routing file is normally installed in $KANGAROO_INSTALL_DIR/etc/kangaroo.routing. Most simple systems do not need any special instructions in the routing file. By default, the supplied routing table will work for any one-hop Kangaroo configuration.

Regardless of the number of hosts participating, if you always want your data to go to the local server, and from there to the target server, there is no need to edit the routing file.

However, you will need to set up some explicit rules in order to create a multi-hop network. At each node where data does not flow directly to the target, a rule must be supplied that directs the flow of traffic to the appropriate node.

Each rule in the routing file is a triple of host names or IP addresses. The first name lists the host(s) to which the rule applies. The second gives the destination address, and the third gives the hop to be used for that destination.

If multiple rules match, the last rule matched will be used. So, start with the most generic rules, and end with the most specific.

Certain special hop names can be used to control the behavior of a server:

Of course, designing and debugging such files can be tricky. To assist in this process the kangaroo_traceroute tool can give you a quick check of the distributed system:

% kangaroo_traceroute -h there

Here is a complete example for the mythical company Stooges, Inc. Suppose that we have three hosts: Larry, Curly, and Moe. Larry and Curly share a filesystem by NFS, so storing a file at one is equivalent to storing a file at another. Moe is a big server that sits closer to the organization's internet connection, so it will be used for outgoing data to all hosts. We will send all outgoing data to our network provider at Upstream Communications. Further, our friends at Yoyodyne have requested that we stop sending them data for a day or so.

Here is what the routing file would look like:


#
# Moe sends data for curly or larry directly there.
# Data destined for yoyodyne is held at the moment.
# Otherwise, send it all to upstream.
#

moe.stooges.com     *                   upstream.com
moe.stooges.com     kang.yoyodyne.com   @HOLD
moe.stooges.com     larry.stooges.com   @DIRECT
moe.stooges.com     curly.stooges.com   @DIRECT

#
# Larry sends most traffic to moe.
# Traffic bound for curly can be accepted here,
# because they share a file system.
#

larry.stooges.com   *                   moe.stooges.com
larry.stooges.com   curly.stooges.com   @STORE

#
# Curly is pretty similar to larry:
#

curly.stooges.com   *                   moe.stooges.com
curly.stooges.com   larry.stooges.com   @STORE

Configuration Values

The Kangaroo server, client, and PFS module may all be configured with a number of settings. This section lists all of the various and sundry settings referred to in the manual. There are two ways to change these settings:
  1. Via environment variables
    The easiest way to change the behavior of all of the Kangaroo components is to simply set an environment variable. Simply change the configuration names to upper case, and substitute underscores for periods. For example, to set kangaroo.notify.delay to 600 seconds,
    setenv KANGAROO_NOTIFY_DELAY 600
    
  2. Via a configuration file
    If many hosts are to share the same configuration, it may be easier to place all the settings in a file. Simply edit ${KANGAROO_INSTALL_DIR}/etc/kangaroo.config and fill it with configuraation settings like this:
    kangaroo.notify.delay: 600
    
The allowed configuration settings are: