Parrot User's Manual

15 September 2003

Parrot is Copyright (C) 2003 Douglas Thain. This program is released under the GNU General Public License. See the file COPYING for details. This manual may be out of date. Please check the Parrot Web Page for the most recent version.

Overview

Parrot is a tool for attaching old programs to new storage systems. Parrot makes a remote storage system appear as a file system to a legacy application. Parrot does not require any special privileges, any recompiling, or any change whatsoever to existing programs. It can be used by normal users doing normal tasks. For example, an anonymous FTP service is made available to vi like so:
% parrot vi /anonftp/ftp.cs.wisc.edu/RoadMap

Parrot is useful to users of distributed systems, because it frees them from rewriting code to work with new systems and relying on remote administrators to trust and install new software. Parrot is also useful to developers of distributed systems, because it allows rapid deployment of new code to real applications and real users that do not have the time, inclination, or permissions to build a kernel-level filesystem.

Parrot currently supports a variety of remote I/O systems, all detailed below. We welcome contributions of new remote I/O drivers from others. However, if you are working on a protocol driver please drop us a note so that we can make sure work is not duplicated.

Almost any application - whether static or dynmically linked, standard or commercial, command-line or GUI - should work with Parrot. There are a few exceptions. Because Parrot relies on the Linux ptrace interface any program that relies on the ptrace interface cannot run under Parrot. This means Parrot cannot run a debugger, nor can it run itself recursively. In addition, Parrot cannot run setuid programs, as the operating system considers this a security risk.

Like any software, Parrot is bound to have some bugs. Please check the known bugs page for the latest scoop.

Installation

Parrot currently runs only on the Linux operating system. It relies on some fairly low level details in order to implement system call trapping. Ports to other platforms that are similar to Linux may be possible in the future.

Parrot might already be installed on your system. To check for Parrot, simply run parrot -v. If you see the following message:

    % parrot -v
    parrot version 0.9.5 built by thain@coral.cs.wisc.edu on Jul 16 2003 at 11:16:58
then Parrot is already installed and you may skip to "Examples" below. If instead, you see this message:
    parrot: Command not found.
then you must install Parrot yourself. Begin by downloading a Parrot package from the Parrot home page. We recommend that you install a binary package.

Installation From Binaries

Simply unpack the tarball in any directory that you like, and then add the bin directory to your path. For example, to install in /home/fred/parrot:
% cd /home/fred
% gunzip parrot-xxx-yyy.tar.gz
% tar xvf parrot-xxx-yyy.tar
% setenv PATH /home/fred/parrot/bin:$PATH
That's all; skip to the Examples section below.

Installation From Source

Building Parrot from source is quite complicated simply because Parrot must pull together a number of completely independent systems that are quite complex themselves. If you really want to do this, begin by grabbing a fresh cup of coffee. Then, download the basic Parrot source:
  1. Parrot (Required)
The basic Parrot package can speak HTTP, FTP, and Chirp. Other protocols require that you download and install a number of other packages:
  1. Kerberos V (Optional)
  2. Globus (Optional) (Must be Globus 2.x)
  3. NeST (Optional)
  4. RFIO (from Castor) (Optional)
  5. DCache (Optional)
Once all these packages are correctly installed, unpack Parrot and then issue a configure command that points to all of the other installations. Then, make and install. For example:
% gunzip parrot-xxx.tar.gz
% tar xvf parrot-xxx.tar
% cd parrot-xxx
% ./configure --prefix /home/fred/parrot -with-globus-path /usr/local/globus ...
% make
% make install

Examples

To use Parrot, you simply use the parrot command followed by any other Unix program. For example, to run a Parrot-enabled vi, execute this command:
% parrot vi /anonftp/ftp.cs.wisc.edu/RoadMap
Of course, it can be clumsy to put parrot before every command you run, so try starting a shell with Parrot already loaded:
% parrot tcsh
Now, you should be able to run any standard command using Parrot filenames. Here are some examples to get you thinking:
% acroread /http/www.cs.wisc.edu/condor/doc/usenix_1.92.pdf
% grep Yahoo /http/www.yahoo.com
% set autolist
% cat /anonftp/ftp.cs.wisc.edu/[Press TAB here]
We have limited the examples so far to HTTP and anonymous FTP, as they are the only services we know that absolutely everyone is familiar with. There are a number of other more powerful and secure remote services that you may be less familiar with. PFS supports them in the same form: The filename begins with the service type, then the host name, then the file name. Here are all the currently supported services:
example pathremote servicemore info
/http/www.yahoo.com/index.htmlHypertext Transfer Protocolincluded
/ftp/ftp.cs.wisc.edu/RoadMapFile Transfer Protocolincluded
/anonftp/ftp.cs.wisc.edu/RoadMapAnonymous File Transfer Protocolincluded
/chirp/target.cs.wisc.edu/pathCondor Chirp I/Oincluded
/gsiftp/ftp.globus.org/pathGlobus Security + File Transfer Protocolmore info
/nest/nest.cs.wisc.edu/pathNetwork Storage Technologymore info
/rfio/host.cern.ch/pathCastor Remote File I/Omore info
/dcap/dcap.cs.wisc.edu/pnfs/cs.wisc.edu/pathDCache Access Protocolmode info

You will notice quite quickly that few remote I/O systems provide all of the functionality common to an ordinary file system. For example, HTTP is incapable of listing files. (This is a design limitation of HTTP, not a bug in Parrot.) If you attempt to perform a directory listing on an HTTP server, Parrot will attempt to keep ls happy by producing a bogus directory entry:
    % parrot ls -la /http/www.yahoo.com/
    -r--r--r--    1 thain    thain           0 Jul 16 11:50 /http/www.yahoo.com
A less-drastic example is found in FTP. If you attempt to perform a directory listing of an FTP server, Parrot fills in the available information -- the file names and their sizes -- but again inserts bogus information to fill the rest out:
    % parrot ls -la /anonftp/ftp.cs.wisc.edu
    total 0
    -rwxrwxrwx    1 thain    thain        2629 Jul 16 11:53 RoadMap
    -rwxrwxrwx    1 thain    thain     1622222 Jul 16 11:53 ls-lR
    -rwxrwxrwx    1 thain    thain      367507 Jul 16 11:53 ls-lR.Z
    -rwxrwxrwx    1 thain    thain      212125 Jul 16 11:53 ls-lR.gz
If you would like to get a better idea of the underlying behavior of Parrot, try running it with the -d remote option, which will display all of the remote I/O operations that it performs on a program's behalf:
    % parrot -d remote ls -la /anonftp/ftp.cs.wisc.edu
    ...
    ftp.cs.wisc.edu <-- TYPE I
    ftp.cs.wisc.edu --> 200 Type set to I.
    ftp.cs.wisc.edu <-- PASV
    ftp.cs.wisc.edu --> 227 Entering Passive Mode (128,105,2,28,194,103)
    ftp.cs.wisc.edu <-- NLST /
    ftp.cs.wisc.edu --> 150 Opening BINARY mode data connection for file list.
    ...
If your program is upset by the unusual semantics of such storage systems, then consider using the Chirp protocol and server, described in more detail below.

Name Resolution

In addition to accessing remote storage, Parrot allows you to create a custom namespace for any program. All file name activity passes through the Parrot name resolver, which can transform any given filename according to a series of rules that you specify.

The simplest name resolver is the mountlist, given by the -m mountfile option. This file corresponds closely to /etc/ftsab in Unix. A mountlist is simply a file with two columns. The first column gives a logical directory or file name, while the second gives the physical path that it must be connected to.

For example, if a database is stored at an FTP server under the path /anonftp/ftp.cs.wisc.edu/db, it may be spliced into the filesystem under /dbase with a mount list like this:

     /dbase       /anonftp/ftp.cs.wisc.edu/db
Instruct Parrot to use the mountlist as follows:
    % parrot -m mountfile tcsh
    % cd /dbase
    % ls -la
A single mount entry may be given on the command line with the -M option as follows:
    % parrot -M /dbase=/anonftp/ftp.cs.wisc.edu/db tcsh

The Chirp Protocol and Server

Most programs are quite satisifed by the Unix emulation performed by Parrot. However, some programs may require access to the full Unix interface in order to perform administrative tasks such as changing file permissions, owners, and so forth. For such programs, we recommend using the Chirp protocol and server, both included with the standard distribution of Parrot.

Chirp is a simple protocol that corresponds closely to the traditional Unix I/O interface, include open(), read(), stat(), readdir(), and so forth. A standalone Chirp server can offer your programs fine-grained file access from anywhere on the network. A Chirp server is started as follows:

    % chirp_server -d all -a my.authfile
The -d all option turns on debugging, which helps you to understand how it works initially. You may remove this option once everything is working. The -a my.authfile specifies a file which gives the authentication and authorization policy for the server. More on that in a minute.

Suppose the Chirp server is running on bird.cs.wisc.edu. Using Parrot, you may access all of the Unix features of that host from elsewhere:

    % parrot tcsh
    % cd /chirp/bird.cs.wisc.edu/tmp
    % ls -la
    % ...

Naturally, one should be concerned about the security of such a service. The Chirp server has a flexible security policy which allows you to accept or deny users via one of several authentication schemes. The Chirp server may be a personal server for only you, or it may be run as the superuser and satisfy a number of users. It's up to you.

Here is a summary of the authentication schemes:
TypeSummaryPersonal?Multi-User?
kerberos Centralized private key system no yes (host cert)
globus Distributed public key system yes (user cert) yes (user cert)
filesystem Authenticate via a local or distributed filesystem. yes yes
hostname Reverse DNS lookup yes yes
address Identify by IP address yes yes

Parrot will attempt all of the authentication types it knows until it successfully connects to a Chirp server. You must explicitly specify the security policy for the Chirp server in an authfile, passed on the command line. An example authfile is distributed with Parrot in etc/chirp.authfile.example.

Here's how it works. Each line in the file has four fields separated by colons: the authentication type, the permitted hostnames, the permitted remote users, and the corresponding local users. Asterisks may be used in the first three fields as wildcards. The fourth field must be either a valid local username or an asterisk, indicating that the local username is chosen by the authentication type. Each line in the file is compared against the calling user in order. If one matches, the user is accepted and assigned the username in the fourth field.

Here are some examples. Suppose that I wish to run a personal server as an ordinary user thain, and I am willing to trust any user calling from two different hosts called red and blue, as well as any hosts that can authenticate with my X.509 identity:

    hostname:red.cs.wisc.edu:red.cs.wisc.edu:thain
    hostname:blue.cs.wisc.edu:blue.cs.wisc.edu:thain
    globus:*:/C=US/O=National Computational Science Alliance/CN=Douglas Thain:thain
Or, suppose that I am running a server as the superuser, and I am willing to trust any user that can authenticate via Kerberos or via the local filesystem if on the same host. In addition, I consider any user on the host operator.cs.wisc.edu to be equivalent to the user named sysop:
    kerberos:*:*:*
    filesystem:bird.cs.wisc.edu:*:*
    hostname:operator.cs.wisc.edu:operator.cs.wisc.edu:sysop
A Chirp server creates a new process for every incoming client. If the server is run as the superuser, the process will setuid to the id of the authenticated user. If the server is run as an ordinary user, it will check to make sure that the authenticated user matches the owner user, otherwise the connection is declined.

Each of the authentication types has a few things you should know:

Kerberos: The server will attempt to use the Kerberos identity of the host it is run on. (i.e. host/coral.cs.wisc.edu@CS.WISC.EDU) Thus, it must be run as the superuser in order to access its certificates.

Globus: The server and client will attempt to perform peer-to-peer authentication using the Grid Security Infrastructure. Both sides must have access to a proxy certificate by running grid-proxy-init.

Filesystem: This method makes use of an existing filesystem (local or distributed) to establish the client's identity. It assumes that both machines share the same conception of the user database and have a common directory which they can read and write. By default, the server will pick a filename in /tmp, and challenge the client to create that file. If it can, than the server will examine the owner of the file to determine the client's username. Naturally, /tmp will only be available to clients on the same machine. However, if a shared filesystem directory is available, give that to the chirp server via the -c option. Then, any authorized client of the filesystem can authenticate to the server. For example, at UW, we use -c /afs/cs.wisc.edu/common/tmp to authenticate via our AFS distributed file system.

Hostname: The server will rely on a reverse DNS lookup to establish the fully-qualified hostname of the calling client. The fourth field is then used to select an appropriate local username. Notice that the second and third fields of a 'hostname' line in the authfile must be identical. Address: Like "hostname" authentication, except the server simply looks at the client's IP address.

If you have difficulty getting authorization to work, we recommend that you run the Chirp server and the corresponding Parrot client with the -d auth option. This will show details of all the authentication methods attempted, as well as the lines in the authfile that are accepted or rejected, along with the reasons why.

By default, Parrot will attempt every authentication type that it knows until one succeeds. If you wish to restrict or re-order the authentication types that Parrot uses, give one or more -a options, naming the authentication types to be used, in order. For example, to attempt only hostname and kerberos authentication, in that order:

   % parrot -a hostname -a kerberos tcsh

Options and Environment

Parrot has several command line options and corresponding environment variables. Use these Chirp authentication methods.(PARROT_CHIRP_AUTH)
OptionPurposeEnvironment Variable
-a <list>
-b <bytes> Set the recommended remote I/O block size.PARROT_LOCAL_BLOCK_SIZE
-B <bytes> Set the recommended local I/O block size.PARROT_REMOTE_BLOCK_SIZE
-cConnect to the local Condor Chirp proxy.PARROT_RESOLVE_CHIRP
-C <MB>Set the size of the I/O channel.PARROT_CHANNEL_SIZE
-d <system>Enable debugging for this sub-system.PARROT_DEBUG_FLAGS
-hShow this screen.
-m <file> Use this file as a mountlist.PARROT_MOUNT_FILE
-M <local>=<remote>Mount this remote file on this local directory.
-o <file>Send debugging messages to this file.PARROT_DEBUG_FILE
-p <host:port>Use this proxy for HTTP requests.PARROT_HTTP_PROXY
-t <dir>Where to store temporary files.PARROT_TEMP_DIR
-vDisplay version number.

The flexible debugging flags can be a great help in both debugging and understanding Parrot. To turn on multiple debugging flags, you may either issue multiple -d options:

    % parrot -d ftp -d chirp tcsh
Or, you may give a space separated list in the corresponding environment variable:
    % setenv PARROT_DEBUG_FLAGS "ftp chirp"
    % parrot tcsh
Here is the meaning of each of the debug flags.

syscallThis shows all of the system calls attempted by each program, even those that Parrot does not trap or modify. (To see arguments and return values, try -d libcall instead.)
libcallThis shows only the I/O calls that are actually trapped and implemented by Parrot. The arguments and return codes are the logical values seen by the application, not the underlying operations. (To see the underlying operations try -d remote or -d local instead.)
cacheThis shows all of the shared segments that are loaded into the channel cache and shared by multiple programs. For most programs, this means all the shared libraries.
processThis shows all process creations, deletions, signals, and process state changes.
resolveThis shows every invocation of the name resolver. A plain file name indicates the name was not modified, while more detailed records show names that were changed or denied access.
localThis shows all local I/O calls from the perspective of Parrot. Notice that the file descriptors and file names shown are internal to Parrot. (To see fds and names from the perspective of the job, try -d libcall.)
remoteThis shows all non-local file activity.
httpThis shows only HTTP operations.
ftpThis shows only FTP operations.
nestThis shows only NeST operations.
chirpThis shows only Chirp operations.
rfioThis shows only RFIO operations.
pollThis shows all activity related to processes that block (explicitly or implicitly) waiting for I/O.
timeThis adds the current time to every debug message.
pidThis adds the calling process id to every debug message.
allThis shows all possible debugging messages.