next up previous contents index
Next: 5.3 The Grid Universe Up: 5. Grid Computing Previous: 5.1 Introduction   Contents   Index

Subsections


5.2 Connecting Condor Pools with Flocking

Flocking is Condor's way of allowing jobs that cannot immediately run (within the pool of machines where the job was submitted) to instead run on a different Condor pool. If a machine within Condor pool A can send jobs to be run on Condor pool B, then we say that jobs from machine A flock to pool B. Flocking can occur in a one way manner, such as jobs from machine A flocking to pool B, or it can be set up to flock in both directions. Configuration variables allow the condor_schedd daemon (which runs on each machine that may submit jobs) to implement flocking.

NOTE: Flocking to pools which use Condor's high availability mechanims is not adviced in current verions of Condor. See section  3.11.2 ``High Availability of the Central Manager'' of the Condor manual for a discussion of these problems.


5.2.1 Flocking Configuration

The simplest flocking configuration sets a few configuration variables. If jobs from machine A are to flock to pool B, then in machine A's configuration, set the following configuration variables:

FLOCK_TO
is a comma separated list of the central manager machines of the pools that jobs from machine A may flock to.
FLOCK_COLLECTOR_HOSTS
is the list of condor_collector daemons within the pools that jobs from machine A may flock to. In most cases, it is the same as FLOCK_TO, and it would be defined with
  FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO)
FLOCK_NEGOTIATOR_HOSTS
is the list of condor_negotiator daemons within the pools that jobs from machine A may flock to. In most cases, it is the same as FLOCK_TO, and it would be defined with
  FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO)
HOSTALLOW_NEGOTIATOR_SCHEDD
provides a host-based access level and authorization list for the condor_schedd daemon to allow negotiation (for security reasons) with the machines within the pools that jobs from machine A may flock to. This configuration variable will not likely need to change from its default value as given in the sample configuration:
  ##  Now, with flocking we need to let the SCHEDD trust the other
  ##  negotiators we are flocking with as well.  You should normally
  ##  not have to change this either.
  HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)
This example configuration presumes that the condor_collector and condor_negotiator daemons are running on the same machine. See section 3.6.7 on page [*] for a discussion of security macros and their use.

The configuration macros that must be set in pool B are ones that authorize jobs from machine A to flock to pool B.

The host-based configuration macros are more easily set by introducing a list of machines where the jobs may flock from. FLOCK_FROM is a comma separated list of machines, and it is used in the default configuration setting of the security macros that do host-based authorization:

HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_WRITE_STARTD    = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_READ_COLLECTOR  = $(HOSTALLOW_READ), $(FLOCK_FROM)
HOSTALLOW_READ_STARTD     = $(HOSTALLOW_READ), $(FLOCK_FROM)

Wild cards may be used when setting the FLOCK_FROM configuration variable. For example, *.cs.wisc.edu specifies all hosts from the cs.wisc.edu domain.

If the user-based configuration macros for security are used, then the default will be:

ALLOW_NEGOTIATOR =  $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)

Further, if using Kerberos or GSI authentication, then the setting becomes:

ALLOW_NEGOTIATOR = condor@$(UID_DOMAIN)/$(COLLECTOR_HOST)

To enable flocking in both directions, consider each direction separately, following the guidelines given.


5.2.2 Job Considerations

A particular job will only flock to another pool when it cannot currently run in the current pool.

At one point, all jobs that utilized flocking were standard universe jobs. This is no longer the case. The submission of jobs under other universes must consider the location of input, output and error files. The common case will be that machines within separate pools do not have a shared file system. Therefore, when submitting jobs, the user will need to consider file transfer mechanisms. These mechanisms are discussed in section 2.5.4 on page [*].


next up previous contents index
Next: 5.3 The Grid Universe Up: 5. Grid Computing Previous: 5.1 Introduction   Contents   Index
condor-admin@cs.wisc.edu