A network of Kangaroo servers can provide fault-tolerant, high-throughput data movement services for arbitrary applications. Kangaroo accepts data from an application and then does whatever it takes to get the data delivered to its destination.
Kangaroo is not a high-performance file transfer mechanism. If your goal is to move small files with the least latency possible, then Kangaroo is probably not for you.
However, Kangaroo can provide higher throughput for applications that alternate CPU and I/O bursts. Kangaroo works in the background, so applications are not held hostage to network failures and performance variations.
% gunzip kangaroo.tar.gz % tar xvf kangaroo.tar
/home/fred/kangaroo
as an example. Run configure
with this directory given as an option, then make
and make install
.
% cd kangaroo % ./configure --prefix /home/fred/kangaroo % make % make install
/home/fred/kangaroo
as an example.
% cd /home/fred % gunzip kangaroo-xxx-yyy.tar.gz % tar xvf kangaroo-xxx-yyy.tar
% setenv KANGAROO_INSTALL_DIR /home/fred/kangaroo % setenv PATH ${KANGAROO_INSTALL_DIR}/bin:${PATH}
Next, edit the authorized users file to determine what nodes will be allowed to participate:
% vi ${KANGAROO_INSTALL_DIR}/etc/kangaroo.authusers
localhost *.cs.university.com laptop.wireless.net
A more fine-grained security mechanism is available if you are using the Globus tools. See the security section below.
Finally, on every participating node, start the kangaroo server:
% kangaroo_server
here.cs.wisc.edu
and are running Kangaroo servers on two hosts here.cs.wisc.edu
and there.cs.wisc.edu
.
kangaroo
. To interact with the Kangaroo network, just start the client with the name of any Kangaroo server as an argument. By default, the client will connect to the local host.
% kangaroo connected to nearest server: 127.0.0.1 kangaroo>
kangaroo> get there.cs.wisc.edu /etc/hosts /tmp/hosts
kangaroo> put /etc/hosts there.cs.wisc.edu /tmp/output
kangaroo> push there.cs.wisc.edu /tmp/output
kangaroo> commit kangaroo> quit
sh
or csh
. Don't forget to terminate any sequence of events with a commit
or a push
.
#!/bin/sh kangaroo << EOF put /etc/hosts there.cs.wisc.edu /tmp/output commit EOF
Destination | File | Message | Number | Size |
www.xxx.yyy.zzz | /tmp/output | put | 1 | 93.8 KB |
Total: | 0 | 0 |
In this case, the sender cannot contact the receiver. The receiving host is refusing connections because someone (the author) forgot to to start the server. Oops! After starting a server at the destination, the new, happier report is:
Destination | File | Message | Number | Size |
Total: | 0 | 0 |
Kangaroo is very persistent in moving data. No simple failure -- such a broken network, a crashed host, or a missing directory -- will cause your data to be lost. If Kangaroo is unable to move data, it will pause briefly and try again later. Initially, it pauses five seconds. Subsequent failures increase the pause time exponentially up to a minute. If Kangaroo is stopped by any condition lasting more than five minutes, it will notify the server's owner by sending email. To avoid overwhelming you with mail, Kangaroo will not notify you twice for the same problem, nor will it notify you more than once in four hours. For example, here is an email warning generated by the same problem described above:
Date: Mon, 6 Aug 2001 10:38:34 -0500 (CDT) From: Douglas ThainTo: thain Subject: [Kangaroo] problem This is a recorded message from the Kangaroo server at www.xxx.yyy.zzz. I am currently unable to move data: couldn't send 'put /tmp/output' to www.xxx.yyy.zzz because Connection refused I will not send you any more notices before Mon Aug 6 14:38:34 2001 kangaroo version 0.3 Aug 6 2001 09:42:29 For more information about Kangaroo, see http://www.cs.wisc.edu/condor/kangaroo
The Kangaroo client tool tool is useful for moving whole files here and there, but it is clumsy to use. A more natural interface can be applied by using the Pluggable File System. (You'll have to download and install it separately.) With PFS installed, then you may simply access Kangaroo as a file system, like so:
% pfsrun tcsh % vi /kangaroo/there/tmp/data % gcc test.c -o /kangaroo/there/home/fred/test.exe
PFS transparently converts an application's UNIX operations into the corresponding Kangaroo operations.
Administrative commands such as push
and route
must still be accessed from
the Kangaroo client in the usual way.
This sections describes some of the more arcane features of Kangaroo. If this is your first trip through the manual, I recommend that you stop here and try your hand at the examples.
Kangaroo provides a simple security model which authenticates one host or one process to another. If multiple users of a single server don't trust each other, then we can't help you there.
Security concerns can be split up into authentication and authorization. The former is concerned with finding out who is calling. The latter is concerned with determining whether the caller, once known, has permission to proceed. Kangaroo has two authentication mechanisms and one authorization mechanism.
The two authentication mechanisms are trivial and Globus. Trivial authentication simply looks up the DNS name of the calling host. Globus authentication uses public key encryption to determine the X.509 name of the calling process. You may use either or both to limit the clients that can contact a particular server. The list of accepted clients is given in the "authorized users" file, which is $KANGAROO_INSTALL_DIR/etc/kangaroo.authusers
. You should edit this file to contain the host names or X.509 subjects you are willing to admit. For example, if you are willing to accept connections from the local machine, any machine in your department, and Fred from any other machine, your authorized users file might look like this:
localhost *.construction.bedrock.gov /C=US/O=Bedrock Township/OU=Construction Services/CN=Fred Flintstone
The only hitch to using Globus authentication is that every participating process must be running a grid proxy. Before running any clients such as kangaroo_status
, you must run grid-proxy-init
and enter your passphrase. You must also do the same for every host running a server!
Note 2: Security applies to server-server connections as well as client-server connections. If you are using kangaroo_put
at host A to send data to host B, then host A and B must be listed in each other's authorized users file.
Note 3: Debugging security problems can be a real hassle. If things don't seem to be working right, try running the servers and tools with the -d option. Messages beginning with auth
can give some hints into what is going on. For example, if a proxy has expired or was never loaded, you will see:
auth_globus: couldn't load my credentials: did you run grid-proxy-init?
A complicated Kangaroo system needs a routing file in order to direct traffic where it needs to go. The routing file is normally installed in $KANGAROO_INSTALL_DIR/etc/kangaroo.routing
. Most simple systems do not need any special instructions in the routing file. By default, the supplied routing table will work for any one-hop Kangaroo configuration.
Regardless of the number of hosts participating, if you always want your data to go to the local server, and from there to the target server, there is no need to edit the routing file.
However, you will need to set up some explicit rules in order to create a multi-hop network. At each node where data does not flow directly to the target, a rule must be supplied that directs the flow of traffic to the appropriate node.
Each rule in the routing file is a triple of host names or IP addresses. The first name lists the host(s) to which the rule applies. The second gives the destination address, and the third gives the hop to be used for that destination.
If multiple rules match, the last rule matched will be used. So, start with the most generic rules, and end with the most specific.
Certain special hop names can be used to control the behavior of a server:
@DIRECT
- Connect directly to destination.
@STORE
- Store matching chunks on this filesystem
@HOLD
- Hold matching chunks at this node.
Of course, designing and debugging such files can be tricky. To assist in this process the kangaroo_traceroute
tool can give you a quick check of the distributed system:
% kangaroo_traceroute -h there
Here is a complete example for the mythical company Stooges, Inc. Suppose that we have three hosts: Larry, Curly, and Moe. Larry and Curly share a filesystem by NFS, so storing a file at one is equivalent to storing a file at another. Moe is a big server that sits closer to the organization's internet connection, so it will be used for outgoing data to all hosts. We will send all outgoing data to our network provider at Upstream Communications. Further, our friends at Yoyodyne have requested that we stop sending them data for a day or so.
Here is what the routing file would look like:
# # Moe sends data for curly or larry directly there. # Data destined for yoyodyne is held at the moment. # Otherwise, send it all to upstream. # moe.stooges.com * upstream.com moe.stooges.com kang.yoyodyne.com @HOLD moe.stooges.com larry.stooges.com @DIRECT moe.stooges.com curly.stooges.com @DIRECT # # Larry sends most traffic to moe. # Traffic bound for curly can be accepted here, # because they share a file system. # larry.stooges.com * moe.stooges.com larry.stooges.com curly.stooges.com @STORE # # Curly is pretty similar to larry: # curly.stooges.com * moe.stooges.com curly.stooges.com larry.stooges.com @STORE
The Kangaroo server, client, and PFS module may all be configured with a number of settings. This section lists all of the various and sundry settings referred to in the manual. There are two ways to change these settings:
setenv KANGAROO_NOTIFY_DELAY 600
${KANGAROO_INSTALL_DIR}/etc/kangaroo.config
and fill
it with configuraation settings like this:
kangaroo.notify.delay: 600
The allowed configuration settings are: