Next: 9. Command Reference Manual
Up: 8. Version History and
Previous: 8.5 Development Release Series
Contents
Index
Subsections
8.6 Stable Release Series 7.4
This is a stable release series of Condor.
As usual, only bug fixes (and potentially, ports to new platforms)
will be provided in future 7.4.x releases.
New features will be added in the 7.5.x development series.
The details of each version are described below.
Version 7.4.5
Release Notes:
- Condor version 7.4.5 not yet released.
New Features:
- condor_dagman now prints a message in the dagman.out file
whenever it truncates a node job user log file.
- condor_dagman now prints additional diagnostic information in the
case of certain log file errors.
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- A network disconnect between the submit machine and execute
machine during the transfer of output files caused the
condor_starter daemon to immediately give up, rather than waiting
for the condor_shadow to reconnect. This problem was introduced
in Condor version 7.4.4.
- If condor_ssh_to_job attempted to connect to a job while the
job's input files were being transferred, this caused the file
transfer to fail, which resulted in the job returning to the idle
state in the queue.
- In privsep mode, the transfer of output failed if a job's execute
directory contained symbolic links to non-existent paths.
Known Bugs:
Additions and Changes to the Manual:
Version 7.4.4
Release Notes:
New Features:
- load_profile is now supported by the Unix version of
condor_submit when submitting jobs to Windows. Previously, this command
was only supported by the Windows version of condor_submit.
- Added an example Mac OS X launchd configuration file for starting Condor.
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- Fixed bad behavior in condor_quill where, under certain error
conditions, many copies of the schedd_sql.log file would be
inserted into the database, filling up the disk volume used by the
database. As a consequence of this bug fix, the
LogBody
column
for each row in the Error_SqlLogs
table will be empty. Please
consult the condor_quill daemon log file for the error instead.
- Fixed a bug with how the standard universe
remote system call getrlimit() functioned.
Under certain conditions with
32-bit and 64-bit standard universe binaries,
getrlimit() would erroneously report 2147483647 bytes as a limit,
when RLIM_INFINITY should have been the correct response.
- Fixed a misleading error message issued by condor_run,
which stated
The DAGMan job was aborted by the user.
when the job submitted by condor_run was aborted by the user.
It now states
The job was aborted by the user.
- When the condor_startd daemon is running with an execute directory on
a very large file system, with more than 32 bits worth of free blocks
on a 32-bit system, it would incorrectly report 0 free bytes. This
has been fixed.
- For spooled jobs, input files were sometimes transferred twice from
the submit machine to the execute machine. This happened if the input files
were specified without any path information,
as with a file name with no directory specified.
This problem has existed since before Condor version 7.0.0.
- The configuration variable NETWORK_INTERFACE did not
work in some situations, because of Condor's attempts to
automatically rewrite published addresses to match the IP address of
the network interface used to make the publication.
- Fixed a bug in which the default unit of configuration variable
STARTD_CRON_TEST_PERIOD
should have been seconds, but instead was Undefined.
- Fixed a bug in which condor_submit checked for bad condor_schedd cron
arguments incorrectly within a submit description file.
Now condor_submit will detect the problem and print out an error message.
- With some versions of ssh, condor_ssh_to_job failed if
the SHELL environment variable was set to /bin/csh.
- Submission of vm universe jobs via Globus was not possible,
because the Globus Condor jobmanager explicitly set the input, output,
and error files to /dev/null,
and condor_submit refused any setting of these files for
vm universe jobs.
Now, /dev/null is an allowed setting for the input, output,
and error files for vm universe jobs.
- Fixed a bug that caused a vm universe job's output files
to be incorrectly transferred back to the submit machine,
when the submit description file command vm_no_output_vm
was set to false,
indicating that no files should be transferred.
- String literals within
$$([])
expressions within a submit
description file failed to be evaluated and resulted in the job going on hold.
This problem has existed since before Condor 7.0.0.
- condor_preen was not able to clean up files in the EXECUTE
directory when in privsep mode.
- A problem was fixed that could cause a Condor daemon that
connects to other daemons via CCB to permanently run out of space
for more registered sockets until restarted. This problem appeared
in the logs as the following message:
file descriptor safety level exceeded
- Fixed a problem that could cause the condor_collector to crash
when receiving updated matchmaking information for offline ClassAds that do
not exist.
- condor_ssh_to_job did not work when
SEC_DEFAULT_NEGOTIATION was set to OPTIONAL.
- The vm universe now works properly on machines that
have Condor's Privilege Separation mechanism enabled.
- condor_submit no longer pads a vm universe job's disk usage
estimation by 100MB.
- Fixed a bug with the vm_cdrom_files submit file
command, that caused VMware vm universe jobs to fail if the virtual
machine already had a CD-ROM image associated with it.
- Configuration variables SOAP_SSL_CA_DIR and
SOAP_SSL_CA_FILE are now properly used when authenticating
with Amazon EC2 servers.
- Fix a bug with the <subsys>_LOCK configuration variable.
It could let daemons writing to the same daemon log overwrite each other's
entries and cause daemons to exit when the log is rotated.
- Fixed a bug that caused nordugrid jobs to fail if the
grid_resource attribute included a port as part of the server
host name.
- Fixed a confusing error message mentioning
LocalUserLog::logStarterError()
when the condor_starter fails to
communicate with the condor_shadow before the job has started.
- Fixed the event log and shadow log for standard universe jobs to
identify the checkpoint server on which a job might have failed to store
its checkpoint or from which it might have failed to restore its checkpoint.
- Fixed a bug in the condor_gridmanager that could cause it to crash
while handling grid-type cream jobs.
- Improved the condor_gridmanager's handling of grid-type cream jobs
that are held or removed by the user. Canceling the cream job is much less
likely to fail and jobs can no longer get stuck in the cream state of
CANCELED.
- Fixed the web server feature controlled by ENABLE_WEB_SERVER .
Previously, all HTTP GET requests would fail on non-linux Unix machines.
Known Bugs:
Additions and Changes to the Manual:
- The Windows platform installation instructions have been updated.
- Section 2.5.4 on Condor's File Transfer Mechanism
has been revised and updated.
- Section 4.1.4, providing examples of utilizing
ClassAd expressions within the -constraint option of condor_q
or condor_status commands has been expanded to clarify both
Unix and Windows platform specifics.
Version 7.4.3
Release Notes:
- Condor version 7.4.3 released on August 16, 2010.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable ENABLE_CHIRP
defaults to True.
An administrator may set it to False, which
disables Chirp remote file access from execute machines.
- The new configuration variable
ENABLE_ADDRESS_REWRITING defaults to True. It may
be set to False to disable Condor's dynamic algorithm for choosing
which IP address to publish in multi-homed environments. The dynamic
algorithm chooses the IP address associated with the network interface
used to make the publication, for example, the network interface used
to communicate with the condor_collector.
- Configuration variable VM_BRIDGE_SCRIPT has been removed
and is no longer valid.
- The new configuration variable
VM_NETWORKING_BRIDGE_INTERFACE specifies the networking interface
that Xen or KVM vm universe jobs will use.
See section 3.3.29 for documentation.
- Allowed the configuration file entries GSI_DAEMON_TRUSTED_CA_DIR
and GSI_DAEMON_DIRECTORY to be set with environment variables,
as the rest of Condor configuration variables can be.
Bugs Fixed:
Known Bugs:
Additions and Changes to the Manual:
- Searching the PDF version of the manual for items containing
underscore characters, such as many configuration variable names,
now works correctly.
- The new subsection 4.1.3 provides examples of
evaluation results when using the operators ==, =?=,
!=, and =!=.
- Section 2.11 with specifics on vm
universe jobs has been updated to contain more details about
both checkpoints and vm universe jobs in general.
Version 7.4.2
Release Notes:
- Condor version 7.4.2 released on April 6, 2010.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
- When WANT_SUSPEND is defined and evaluates to anything
other than the value True,
it is now utilized as if it were False.
If WANT_SUSPEND is not explicitly defined,
the condor_startd exits with an error message.
Previously, if Undefined, it was treated as an error,
which caused the condor_startd to exit with an error message.
Bugs Fixed:
- Fixed a bug in which the condor_schedd would sometimes negotiate
for and try to run
more jobs than specified by MAX_RUNNING_JOBS. Once the
jobs started running, it would then kill them off to get back below
the limit. This was more likely to happen with slow preemption
caused by MaxJobRetirementTime or by a large timeout
imposed by KILL. This problem has existed since before
Condor 6.5. When this problem happened, the following message
appeared in the condor_schedd log:
Preempting X jobs due to MAX_JOBS_RUNNING change
- Fixed a problem that caused condor_ssh_to_job to fail to connect
to a job running on a slot with multiple '@' signs in its name. This bug
has existed since the introduction of condor_ssh_to_job in 7.3.2.
- In all previous versions of Condor, condor_status refused to
accept -long, -xml, and -format when followed by
an argument such as -master that specified which type of
daemon to look at. The order of the arguments had to be reversed or
it would produce a message such as the following:
Error: arg 4 (-master) contradicts arg 1 (-format)
- Fixed a bug which caused the condor_master to crash if
VIEW_SERVER was included in DAEMON_LIST and
CONDOR_VIEW_HOST was unset.
- Fixed a bug that caused configuration parameter
LOCAL_CONFIG_DIR to be ignored if it was set in a local
configuration file, as opposed to the top-level configuration file.
- Fixed a bug that could cause the condor_schedd to behave
incorrectly when reading an invalid job queue log on startup.
- Fixed a bug that could corrupt the job queue log
if the condor_schedd daemon's attempt to compact it fails.
- Fixed a problem that in rare cases caused the condor_schedd to
crash shortly after the condor_gridmanager exited.
This bug has existed since before Condor version 6.8.
- Fixed a problem that was resulting in messages such as the following:
ERROR: receiving new UDP message but found a long message still waiting
to be closed (consumed=0). Closing it now.
- The file extension specified to condor_fetch_log can no longer
contain a path delimiter.
- When in graceful shutdown mode, the condor_schedd was
sometimes starting idle scheduler universe jobs. With a large
enough number of scheduler universe jobs, this could lead to a cycle
of stopping and restarting jobs until the graceful shutdown time
expired.
- Fixed multiple bugs that prevented Condor from building on or
running correctly on OpenSolaris X86/64 version 2009.06.
- Fixed a bug which caused the condor_startd to incorrectly
count the number of processors on some machines with
Hyper-threading enabled. This bug was introduced in
Condor version 7.3.2, and exists in 7.4.0 and 7.4.1.
- Fixed a problem with GSI authentication in Condor that would cause
daemons to consume more and more memory over time. The biggest source
of trouble was introduced in Condor version 7.3.2.
However, a smaller memory leak that
existed in all previous versions of Condor has also been fixed.
- Fixed a bug where if condor_compile is invoked in a manner such as:
condor_compile gcc -print-prog-name=ld
an error would be emitted,
and condor_compile would exit with a bad exit code.
- The sort based on condor_status output accidentally changed in
Condor version 7.3,
so that the output was based on the slot name first, then machine name.
The behavior is now restored to the original sorting: first on machine name,
then slot name.
- If one machine running a parallel job crashed,
and job leases are enabled (which they are by default),
the job would not exit until the job lease duration expired.
As the condor_starter will not get respawned,
there is no need to wait.
Many sites set long job lease durations,
to prevent jobs from being killed when the machine running
the condor_schedd daemon reboots.
Now, if one node goes away, the whole computation is shut down immediately.
- Fixed the verbosity level of some condor_dagman messages written to
the dagman.out file.
- Fixed a bug introduced in Condor version 7.3.2 that resulted in
messages such as the following even in cases where no problem in
communicating with the condor_collector had been encountered:
Collector <X> is still being avoided if an alternative succeeds.
This problem was believed to be fixed in Condor 7.4.1, but some cases
of the problem remained in that version.
- Fixed a bug from Condor version 6.1.14,
that resulted in the condor_schedd performing
the operation scheduled via WALL_CLOCK_CKPT_INTERVAL at the
specified frequency (default time of 1 hour),
multiplied by the number of times the
condor_schedd daemon had been reconfigured during its lifetime.
This could lead to degraded performance,
especially prior to Condor version 7.4.1,
when this operation was more disk-intensive.
- 32-bit Linux versions of Condor running in a 64-bit environment would
sometimes not detect the existence of some processes and sometimes
wrongly detect that a tracked process belonged to root when it
actually belonged to some other user. This could lead to failure to run
jobs or failure to properly monitor and clean up after them. When the wrong
process ownership problem happened,
the following message appeared in the condor_master and/or condor_procd
logs:
ProcAPI: fstat failed in /proc! (errno=75)
If condor_procd failed to detect the existence of its own parent process,
it would exit with the following message in its log:
ERROR: master has exited
- Fixed a problem in the condor_job_router daemon,
introduced in Condor version 7.2.2,
that could cause the daemon to crash when failing to carry out the change
of state dictated by a job's periodic policy expressions,
for example, the failure to put a job on hold when periodic_hold
becomes True.
- Fixed a bug introduced in Condor 7.3.2 that caused Grid Monitor
jobs to receive a full X.509 proxy. Now, it always receives a limited
proxy, which was the previous behavior.
- Fixed a bug that could cause the nordugrid_gahp to crash.
- Fixed a problem introduced in 7.4.0 that could cause two
condor_schedd daemons
with a match to the same slot to both fail to claim it, rather than
letting the first one to claim it succeed. This sort of situation
can happen when the condor_negotiator has a stale view of the pool,
either because the gap between negotiation cycles is configured to
be shorter than usual, or because updates from the condor_startd
to the condor_collector
are not reliably delivered and processed.
- The condor_kbdd is no longer ignored by the condor_startd
when the configuration variable CONSOLE_DEVICES is defined.
- When using the -d command line argument with a daemon,
the values of LOG, SPOOL, and EXECUTE
no longer change every time a condor_reconfig command is received.
Known Bugs:
- The condor_kbdd has a chance of entering an infinite loop
on platforms that use X-Windows. Microsoft Windows and Mac OS X
are not affected. Removing KBDD from DAEMON_LIST is a
workaround, although this impairs Condor's ability to detect
console usage. This bug is fixed in Condor version 7.4.3.
Additions and Changes to the Manual:
- Descriptions of all the commands that may be placed into a
submit description file are now located within the condor_submit
manual page, instead of within Chapter 2, the Users' Manual.
- An initial, but not yet complete set of configuration variables
that require a restart when changed,
is listed in section 3.3.1.
Using condor_reconfig to change these variables' values is not sufficient.
Version 7.4.1
Release Notes:
- Security Item: A flaw was found that could allow a user who already is authorized to
submit jobs into Condor, to queue a job under the guise of a different
user. In this way, someone who has access to a Condor submission
service and is allowed to submit jobs into Condor could gain access to
another non-root or non-administrator account on the system.
This flaw was discovered during the development process; no incidents
have been reported. Details of the problem will be made available on Feb 1st,
2010.
- The default value of JOB_ROUTER_NAME has changed
from an empty string to
jobrouter
in order to address
problems caused by the previous default. Without special handling,
this means that jobs being managed by condor_job_router before
upgrading will not be adopted by the new version of
condor_job_router if the default JOB_ROUTER_NAME was
being used. To correct this, follow the instructions given in the
description of JOB_ROUTER_NAME on
page .
New Features:
- Condor allows submit files to specify an IwdFlushNFSCache
expression,
to control whether or not Condor tries to flush the NFS cache for
a job's initial working directory on job completion.
- The new -attributes option to condor_status
explicitly specifies the attributes to be listed when using the
-xml or -long options.
Configuration Variable and ClassAd Attribute Additions and Changes:
- New VOMS attributes have been introduced into the job ad to keep them
separate from the X509UserProxySubjectName.
- The default for JOB_ROUTER_NAME has changed from an
empty string to
jobrouter
. See the release notes for more
information about upgrading from an old version.
- The configuration variable TCP_FORWARDING_HOST
has existed in Condor since version 7.0.0, but was not documented.
See section 3.3.6 for documentation.
- The new configuration variable STARTD_PER_JOB_HISTORY_DIR
allows ClassAds of completed jobs to be stored in a directory separate
from the existing one specified with PER_JOB_HISTORY_DIR.
Bugs Fixed:
- Condor no longer creates the job sandbox in its SPOOL
directory if it is not needed.
- Fixed a problem introduced in Condor version 7.4.0 that caused GSI
authentication between Condor processes to fail with using a
non-legacy format X.509 proxy.
- Fixed a problem with CCB under Windows platforms that has existed since
Condor version 7.3.0.
This problem caused CCB-enabled daemons to become unresponsive
after the exit of a child process.
- Improved the handling of previously-submitted gt2 grid jobs upon
release from hold, when there is no Globus job manager for the job running
on the remote resource.
- Fixed a problem with job leases for jobs that use a condor_shadow.
Previously, while these jobs were running, lease renewals from the
submitter would not be
noticed, and the job would be aborted when the original lease expired.
- Fixed a bug that only allowed approximately 50 splices to be included into
a DAG input file. There is now no limit to the number of splices
one may include into a DAG input file except, of course, for the
implicit memory allocation limit of the condor_dagman process.
- Removed attempted limiting of swap space via ulimit -v using the
VirtualMemory machine ClassAd attribute in the script
condor_limits_wrapper.sh.
- Fixed a bug that caused ALLOW_CONFIG and
HOSTALLOW_CONFIG, as well as the corresponding
DENY configuration variables to incorrectly handle a
setting consisting of a single * or the equivalent */*. This
also fixes a bug that caused incorrect merging of ALLOW
and HOSTALLOW settings when one, but not both, consisted of
a single * or the equivalent */*.
These bugs have existed since before Condor version 6.8.
- Fixed a bug introduced in Condor version 7.3.0 that could cause
Condor daemons to crash when reading malformed network addresses.
- Removed a check for root ownership of a script specified by
the configuration variable VM_SCRIPT.
- Fixed a bug in writing the header of the file identified by
the configuration variable EVENT_LOG.
- Fixed a bug that could cause the condor_startd to segfault on shutdown
when using dynamic slots.
- Fixed a problem introduced in Condor version 7.3.2 that changed
the behavior of
an undocumented method for selecting attributes to be displayed in
condor_q -xml. Prior to this bug, the following command
would produce XML output with the attributes A and B,
plus a few other attributes that were always shown.
condor_q -xml -format "%s" A+B
In Condor versions 7.3.2 and 7.4.0,
this same command produced an empty XML ClassAd.
The workaround was to use multiple -format options, each listing
just one desired attribute, rather than a single one with an
expression of all desired attributes. Although this is now fixed, the
more straightforward way to select attributes since Condor version 7.3.2
is to use the -attributes option.
- Fixed a bug introduced in Condor version 7.3.2 that resulted in
messages such
as the following even in cases where no problem in communicating
with the condor_collector had been encountered:
Collector <X> is still being avoided if an alternative succeeds.
- Fixed a bug that has been in the condor_startd since before
Condor version 6.8. If the condor_startd ever failed to send signals to the
condor_starter process, it could fail to properly clean up the
machine ClassAd, leaving attributes from
STARTD_JOB_EXPRS in the ClassAd but not making them visible
in condor_status queries. One possible problem resulting from
this could be matches being made by the condor_negotiator that are then
rejected by the condor_startd. Repeated messages such as the following
would then result in the condor_startd log:
slot1: Request to claim resource refused.
- Fixed a problem that resulted in the following message in the
condor_startd log:
Timer -1 not found
- Fixed a problem in which security sessions were not cached
correctly when using CCB. This resulted in re-authentication in
some cases where a cached security session could have been used.
- Fixed multiple problems with the handling of VOMS attributes in GSI
proxies.
- Fixed a bug that caused condor_dagman to hang when running a
DAG with POST scripts, if the global event log is turned on.
- Improved how the private network address is published when using
the configuration variables PRIVATE_NETWORK_NAME and
PRIVATE_NETWORK_INTERFACE. In some cases, this
information was not being used, and therefore connections were made
to the public address when they could have been made to the private
address.
- Fixed a bug exhibited under Windows XP,
where using USE_VISIBLE_DESKTOP
would cause strange behavior after a job completed.
- CCB now works with TCP_FORWARDING_HOST. Previously,
the reverse connection was made to the private address rather than
to the host defined by TCP_FORWARDING_HOST.
- Removed a bad optimization that caused some information about job
execution to be lost during job completion or removal,
if a history file was not configured.
- Condor now checks whether the configuration variable
GRIDFTP_URL_BASE is set before
submitting cream grid jobs, as that variable is required for cream jobs
to function properly. If the variable is not set, cream jobs are put on
hold with an appropriate message.
- Fixed a bug that allowed running virtual machines to be leaked
if the condor_startd crashed.
- Fixed a bug in cream_gahp which could cause crashes when
there were more than 500 cream jobs queued.
- Improved recovery when Condor crashes during the submission of a cream
grid job. Before, affected jobs would remain in REGISTERED state
on the cream server, but never run.
- Improved the HoldReason message when cream grid jobs are
held by the condor_gridmanager.
- When naming a resource for a cream grid job, Condor now properly
recognizes the format used by the standard cream client UI:
https://foo.edu:8443/cream-pbs-cream_queue.
- The configuration variable SOAP_SSL_CA_FILE is now
consulted in addition to
SOAP_SSL_CA_DIR when authenticating
an https proxy for Amazon EC2, when AMAZON_HTTP_PROXY is defined.
- Previously, if condor_rm and friends were given both a constraint
and a user name or cluster id, they would act on all jobs matching the
constraint and all jobs associated with the user or cluster. Now, this
combination of arguments results in an error.
- Failure to purge a cream grid universe job from the remote server
because it was previously purged no longer results in the job being held.
- The condor_gridmanager now recognizes VOMS attributes in X.509
proxies and will handle them appropriately. For example, it recognizes
that two proxies with the same identity but different VOMS attributes may
be mapped to different accounts on a remote machine.
- Fixed a bug in condor_dagman, introduced in 7.3.2, that will
cause condor_dagman running on Windows to hang on any DAG using
more than one log file for the node jobs.
- Fixed a bug in condor_dagman, introduced in 7.3.2, that could
cause condor_dagman to fail on a DAG using node job log files on
multiple devices, if log files on different devices happened to have
the same inode number.
- Fixed a bug that caused the condor_schedd daemon to segfault when
spooling more than 9 files.
- Fixed a bug that caused the condor_startd daemon to crash on
Debian Stable.
- Fixed keyboard activity detection on the Windows XP platform.
- Fixed a bug in the condor_had daemon that caused it to not start
the controlled daemon if CCB was enabled.
Known Bugs:
- The condor_kbdd has a chance of entering an infinite loop
on platforms that use X-Windows. Microsoft Windows and Mac OS X
are not affected. Removing KBDD from DAEMON_LIST is a
workaround, although this impairs Condor's ability to detect
console usage. This bug is fixed in Condor version 7.4.3.
- condor_dagman may fail on Windows if the set of node job log
file names includes multiple paths that are hard links (not symbolic links)
to the same file.
- condor_dagman PRE and POST script arguments (and the names of
the scripts themselves) cannot contain spaces.
- condor_dagman VARS values cannot contain single quotes.
Additions and Changes to the Manual:
- Added documentation about how to include spaces (and other
special characters) in condor_dagman VARS values.
Version 7.4.0
Release Notes:
New Features:
- Condor is now integrated with the Hadoop Distributed File System (HDFS).
See documentation in section 3.13.2 and
section 3.3.23.
- condor_q using the options -analyze and -better-analyze
now provide analysis for scheduler and local universe jobs.
Specifically, the START_SCHEDULER_UNIVERSE and
START_LOCAL_UNIVERSE expressions are checked.
- Added the ClassAd attributes
TotalLocalRunningJobs, TotalLocalIdleJobs,
TotalSchedulerRunningJobs, and TotalSchedulerIdleJobs
to the published ClassAd for the condor_schedd. This means that
condor_q -analyze can still give helpful information about
why local or scheduler universe jobs are idle when
the configuration variables START_LOCAL_UNIVERSE or
START_SCHEDULER_UNIVERSE refer to these attributes.
These attributes were already present internally within
the condor_schedd daemon,
just not published.
- The condor_vm-gahp now supports KVM and links with libvirt, rather
than calling virsh command-line tools.
- Greatly improved the condor_gridmanager's scalability when handling
many grid type gt2 grid universe jobs. Improvements include more quickly
processing updated X.509 certificates, not checking jobs for status updates if
they have not been submitted to the remote site, and eliminating unnecessary
updates to the condor_schedd daemon.
- Latency in the submission and cleaning up of Condor-C jobs
has been improved by changing the default value of
C_GAHP_CONTACT_SCHEDD_DELAY from 20 to 5.
- The eval() ClassAd function added in Condor version 7.3.2
is now also understood by the condor_job_router and
condor_q using the -better-analyze option.
- The submit command run_as_owner is now supported
for Unix platforms. Previously, it was only supported on Windows platforms.
- When setting AMAZON_HTTP_PROXY, a username and password
can now be given as part of the proxy URL.
The value of SOAP_SSL_CA_DIR is now consulted when authenticating
an https proxy for Amazon EC2, when AMAZON_HTTP_PROXY is defined.
- The condor_collector daemon now advertises to itself, and will appear
in the output of condor_status -collector.
- Optimizations in core Condor systems should provide minor speed
improvements.
- Updated the maximum log size to the maximum operating system's file size.
Configuration Variable and ClassAd Attribute Additions and Changes:
- The undocumented configuration variable
TOOLS_PROVIDE_OLD_MESSAGES is no longer recognized by Condor.
- The new configuration variable
SCHEDD_JOB_QUEUE_LOG_FLUSH_DELAY sets an
upper bound in seconds on how long it takes for changes to the job
ClassAd to be visible to the Condor Job Router and to Quill.
The default value is 5 seconds.
Previously, there was no upper limit. Typically, other activity in
the job queue, such as jobs being submitted or completed would cause
buffered data to be flushed to disk, such that the effective upper bound was
a function of how busy the job queue was.
- The default configuration file now uses
ALLOW/DENY in place of
HOSTALLOW/HOSTDENY. See the release notes above
for more information.
- The default value for MAX_JOBS_RUNNING has changed.
Previously, it was 200. Now it is defined by an expression that depends
on the total amount of memory and the operating system. The default
expression requires 1MByte of RAM per running job, on the submit machine.
In some environments and configurations, this is overly
generous and can be cut by as much as 50%. Under Windows, the
number of running jobs is still capped at 200.
A 64-bit version of Windows is recommended in order to raise the value
above the default.
Under Unix, the maximum default is now 10,000. To scale higher, we
recommend that the system ephemeral port range is extended
such that there are at least 2.1 ports per running job.
- The default value of RESERVED_SWAP has changed to
the value 0, which
disables the condor_schedd daemon's check for sufficient swap space
before starting more jobs. The new expression defined with
MAX_JOBS_RUNNING has a more appropriate memory check, so
the configuration variable RESERVED_SWAP will no longer
be used in the near future.
For cases where
RESERVED_SWAP is not set to 0, the default value
of SHADOW_SIZE_ESTIMATE has changed to 800 Kbytes.
Previously, it was 200 if not set,
but it was set to 1800 in the example configuration file.
- The default values of START_LOCAL_UNIVERSE and
START_SCHEDULER_UNIVERSE have changed. Previously,
these were set to True. Now, they are set using an expression
that allows
up to 200 local universe and 200 scheduler universe jobs to run.
- The default value of
GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE has
changed from 100 to 1000.
- The default value of NEGOTIATOR_INTERVAL
has changed from 300 to 60.
- The default value of ENABLE_GRID_MONITOR has been
changed from False to True. This variable
was assigned to True in the example configuration file, so
the change in default value now matches the value set in the example
configuration.
- The configuration variable VM_VERSION has been removed,
as has the machine ClassAd attribute of the same name.
When the virtual machine version information is needed in the machine ClassAd,
the configuration variable STARTD_ATTRS can be used to
add it.
- The default configuration now uses
VM_BRIDGE_SCRIPT and VM_SCRIPT in place of
XEN_BRIDGE_SCRIPT and XEN_SCRIPT due to the
support of KVM.
Submit description file commands have also been added, and they include:
kvm_disk, kvm_transfer_files,
and kvm_cd_rom_device.
- The configuration variables XEN_DEFAULT_KERNEL
and XEN_DEFAULT_INITRD have been removed.
Corresponding to this, the submit description file command
xen_kernel = any is no longer valid.
Bugs Fixed:
Known Bugs:
- The condor_kbdd has a chance of entering an infinite loop
on platforms that use X-Windows. Microsoft Windows and Mac OS X
are not affected. Removing KBDD from DAEMON_LIST is a
workaround, although this impairs Condor's ability to detect
console usage. This bug is fixed in Condor version 7.4.3.
- There are multiple bugs related to using VOMS attributes.
In Condor version 7.4.0, VOMS support should be disabled by setting
the configuration variable USE_VOMS_ATTRIBUTES = FALSE.
- A configuration variable of USE_VISIBLE_DESKTOP set
to True will corrupt the visible desktop.
This bug is present back through Condor version 7.2.4.
This configuration variable did not work at all in 7.2 releases
prior to 7.2.4. This bug will be fixed in Condor version 7.4.1.
- If the global event log (see section 3.3.4) is
turned on, condor_dagman will hang when running any DAG that has
POST scripts.
- condor_dagman will hang on Windows when running any DAG that
uses more than one log file for the node jobs.
Additions and Changes to the Manual:
- See section 3.13.2 and
section 3.3.23 for preliminary documentation of
Condor's integration with the Hadoop Distributed File System (HDFS).
Next: 9. Command Reference Manual
Up: 8. Version History and
Previous: 8.5 Development Release Series
Contents
Index
condor-admin@cs.wisc.edu