Next: 1.4 Current Limitations
Up: 1. Overview
Previous: 1.2 Condor's Power
Contents
Index
- Checkpoint and Migration.
- Where programs can be
linked with Condor libraries, users of Condor may be assured that
their jobs will eventually complete,
even in the ever changing environment that Condor
utilizes.
As a machine running a job submitted to Condor
becomes unavailable,
the job can be check pointed.
The job may continue after migrating
to another machine.
Condor's checkpoint feature
periodically checkpoints a job even in lieu of migration in order to
safeguard the accumulated computation time on a job from being lost in the
event of a system failure, such as the machine being shutdown or a crash.
- Remote System Calls.
-
Despite running jobs on remote machines,
the Condor standard universe execution
mode preserves the local execution environment
via remote system calls. Users do not have to worry
about making data files available to remote workstations or even
obtaining a login account on remote workstations before Condor executes
their programs there. The program behaves under Condor as if it were
running as the user that submitted the job on the workstation where it
was originally submitted, no matter on which machine it really ends up
executing on.
- No Changes Necessary to User's Source Code.
- No special
programming is required to use Condor.
Condor is able to run non-interactive programs.
The checkpoint and migration of
programs by Condor is transparent and automatic, as is the use of
remote system calls.
If these facilities are desired, the user only
re-links the program. The code is neither recompiled nor changed.
- Pools of Machines can be Hooked Together.
- Flocking is
a feature of Condor that allows jobs submitted within a first pool of
Condor machines to execute on a second pool.
The mechanism is flexible, following requests from the job
submission,
while allowing the second pool, or a subset of machines within
the second pool to set policies over the conditions under
which jobs are executed.
- Jobs can be Ordered.
- The ordering of job execution
required by dependencies among jobs in a set is easily handled.
The set of jobs is specified using a directed acyclic graph,
where each job is a node in the graph.
Jobs are submitted to Condor following the dependencies given
by the graph.
- Condor Enables Grid Computing.
- As grid computing
becomes a reality, Condor is already there.
The technique of glidein allows jobs submitted to Condor
to be executed on grid machines in various locations worldwide.
As the details of grid computing evolve, so does Condor's
ability, starting with Globus-controlled resources.
- Sensitive to the Desires of Machine Owners.
- The
owner of a machine has complete priority over the use
of the machine.
An owner is generally happy to let others compute on
the machine while it is idle, but wants it back
promptly upon returning. The owner does not want to take special
action to regain control. Condor handles this automatically.
- ClassAds.
- The ClassAd mechanism
in Condor provides an extremely
flexible, expressive framework for matchmaking
resource requests with resource offers.
Users can easily request both job requirements and job desires.
For example, a user can require that a job run on a machine
with 64 Mbytes of RAM,
but state a preference for 128 Mbytes, if available.
A workstation owner
can state a preference that the workstation runs jobs
from a specified set of users.
The owner can also require that there be no interactive workstation
activity detectable at certain hours before Condor could
start a job.
Job requirements/preferences and resource availability constraints can be
described in terms of powerful expressions, resulting in
Condor's adaptation to nearly any desired policy.
Next: 1.4 Current Limitations
Up: 1. Overview
Previous: 1.2 Condor's Power
Contents
Index
condor-admin@cs.wisc.edu