Although Condor can schedule and
run any type of process, Condor does have some limitations on jobs that it can
transparently checkpoint and migrate:
- Multi-process jobs are not allowed. This includes system calls such as
fork(), exec(), and system().
- Interprocess communication is not allowed. This includes pipes, semaphores, and shared memory.
- Network communication must be brief. A job may make network
connections using system calls such as socket(), but a network
connection left open for long periods will delay checkpointing and migration.
- Sending or receiving the SIGUSR2 or SIGTSTP signals is not allowed.
Condor reserves these signals for its own use. Sending or receiving all
other signals is allowed.
- Alarms, timers, and sleeping are not allowed. This includes system
calls such as alarm(), getitimer(), and sleep().
- Multiple kernel-level threads are not allowed. However,
multiple user-level threads are allowed.
- Memory mapped files are not allowed. This includes system calls such
as mmap() and munmap().
- File locks are allowed, but not retained between checkpoints.
- All files must be opened read-only or write-only. A file opened
for both reading and writing will cause trouble if a job must be rolled back
to an old checkpoint image. For compatibility reasons, a file opened
for both reading and writing will result in a warning but not an error.
- A fair amount of disk space must be available on the submitting machine
for storing a job's checkpoint images. A checkpoint image is approximately
equal to the virtual memory consumed by a job while it runs. If disk space
is short, a special checkpoint server can be designated for storing
all the checkpoint images for a pool.
- On Linux, the job must be statically linked.
condor_compile does this by default.
- Reading to or writing from files larger than 2 GBytes is only supported
when the submit side condor_shadow and the standard universe user job
application itself are both 64-bit executables.
Note: these limitations only apply to jobs which Condor
has been asked to transparently checkpoint. If job checkpointing is not
desired, the limitations above do not apply.