next up previous contents index
Next: 7.6 Managing Large Workflows Up: 7. Frequently Asked Questions Previous: 7.4 Condor on Windows   Contents   Index

Subsections

7.5 Grid Computing

What must be installed to access grid resources?

A single machine with Condor installed such that jobs may be submitted is the minimum software necessary. If matchmaking or glidein is desired, then a single machine must not only be running Condor such that jobs may be submitted, but also fill the role of a central manager. A Personal Condor installation may satisfy both.

I am the administrator at Physics, and I have a 64-node cluster running Condor. The administrator at Chemistry is also running Condor on her 64-node cluster. We would like to be able to share resources. How do we do this?

Condor's flocking feature allows multiple Condor pools to share resources. By setting configuration variables within each pool, jobs may be executed on either cluster. See the manual section on flocking, section 5.2, for details.

What is glidein?

Glidein provides a way to temporarily add a resource to a local Condor pool. Glidein uses Globus resource-management software to run jobs on the resource. Those jobs are initially portions of Condor software, such that Condor is running on the resource, configured to be part of the local pool. Then, Condor may execute the user's jobs. There are several benefits to working in this way. Standard universe jobs may be submitted to run on the resource. Condor can also dynamically schedule jobs across the grid.

See the section on Glidein, section 5.4 of the manual for further information.


Using my Globus gatekeeper to submit jobs to the Condor pool does not work. What is wrong?

The Condor configuration file is in a non-standard location, and the Globus software does not know how to locate it, when you see either of the following error messages.

first error message

% globus-job-run \
  globus-gate-keeper.example.com/jobmanager-condor /bin/date

Neither the environment variable CONDOR_CONFIG, /etc/condor/,
nor ~condor/ contain a condor_config file.  Either set
CONDOR_CONFIG to point to a valid config file, or put a
"condor_config" file in /etc/condor or ~condor/ Exiting.

GRAM Job failed because the job failed when the job manager
attempted to run it (error code 17)

second error message

% globus-job-run \
   globus-gate-keeper.example.com/jobmanager-condor /bin/date

ERROR: Can't find address of local schedd GRAM Job failed
because the job failed when the job manager attempted to run it
(error code 17)

As described in section 3.2.2, Condor searches for its configuration file using the following ordering.

  1. File specified in the CONDOR_CONFIG environment variable
  2. /etc/condor/condor_config
  3. ~condor/condor_config
  4. $(GLOBUS_LOCATION)/etc/condor_config

Presuming the configuration file is not in a standard location, you will need to set the CONDOR_CONFIG environment variable by hand, or set it in an initialization script. One of the following solutions for an initialization may be used.

  1. Wherever globus-gatekeeper is launched, replace it with a minimal shell script that sets CONDOR_CONFIG and then starts globus-gatekeeper. Something like the following should work:

      #! /bin/sh
      CONDOR_CONFIG=/path/to/condor_config
      export CONDOR_CONFIG
      exec /path/to/globus/sbin/globus-gatekeeper "$@"
    
  2. If you are starting globus-gatekeeper using inetd, xinetd, or a similar program, set the environment variable there. If you are using inetd, you can use the env program to set the environment. This example does this; the example is shown on multiple lines, but it will be all on one line in the inetd configuration.
    globus-gatekeeper stream tcp nowait root /usr/bin/env
    env CONDOR_CONFIG=/path/to/condor_config
    /path/to/globus/sbin/globus-gatekeeper
    -co /path/to/globus/etc/globus-gatekeeper.conf
    
    If you're using xinetd, add an env setting something like the following:
    service gsigatekeeper
    {
        env = CONDOR_CONFIG=/path/to/condor_config
        cps = 1000 1
        disable = no
        instances = UNLIMITED
        max_load = 300
        nice = 10
        protocol = tcp
        server = /path/to/globus/sbin/globus-gatekeeper
        server_args = -conf /path/to/globus/etc/globus-gatekeeper.conf
        socket_type = stream
        user = root
        wait = no
    }
    


next up previous contents index
Next: 7.6 Managing Large Workflows Up: 7. Frequently Asked Questions Previous: 7.4 Condor on Windows   Contents   Index
condor-admin@cs.wisc.edu