Next: 8.5 Development Release Series
Up: 8. Version History and
Previous: 8.3 Development Release Series
Contents
Index
Subsections
8.4 Stable Release Series 7.6
This is a stable release series of Condor.
As usual, only bug fixes (and potentially, ports to new platforms)
will be provided in future 7.6.x releases.
New features will be added in the 7.7.x development series.
The details of each version are described below.
Version 7.6.5
Release Notes:
- Condor version 7.6.5 not yet released.
New Features:
- Added explicit support for Linux kernels with a major version number of 3,
to detect and utilize the load average information.
(Ticket #2579).
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
Known Bugs:
Additions and Changes to the Manual:
Version 7.6.4
Release Notes:
- Condor version 7.6.4 released on October 21, 2011.
New Features:
- The new Windows-only condor_rmdir was included in Condor version 7.6.0,
but there was no version history entry for this introduced tool at release.
This item attempts to correct that oversight,
as well as identify that usage of condor_rmdir,
instead of the built-in Windows rmdir,
is enabled by default.
condor_rmdir worked correctly in Condor version 7.6.0,
contained a bug in Condor version 7.6.1,
and was fixed in Condor version 7.6.2.
(Ticket #1877).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable <Keyword>_HOOK_JOB_EXIT_TIMEOUT
defines the number of seconds that the condor_starter will wait
before continuing with a shut down,
if a hook defined by <Keyword>_HOOK_JOB_EXIT has not completed.
The addition of this configuration variable fixes the bug described below.
(Ticket #2543).
- The new configuration variable SKIP_WINDOWS_LOGON_NETWORK
is a boolean value which specifies whether the Windows
LOGON_NETWORK authentication is skipped or not.
If skipped, Condor tries LOGON_INTERACTIVE authentication first.
The addition of this configuration variable fixes the bug described below.
(Ticket #2549).
- The new configuration variable SHADOW_RUN_UNKNOWN_USER_JOBS
defaults to False.
When True,
it allows the condor_shadow daemon to run jobs remotely submitted from
users not in the local password file.
(Ticket #2004).
Bugs Fixed:
- Implemented proper support of values greater than or equal to 2 GBytes
set for the configuration variable MAX_<SUBSYS>_LOG .
(Ticket #2471).
- Updated the condor_negotiator daemon's assessment of pool size
to properly take partitionable slots into account.
See section 3.13.9 for an explanation of
partitionable slots on SMP machines.
(Ticket #2440).
- Provided an informative error message
when the condor_userprio tool cannot locate the condor_negotiator daemon.
(Ticket #2478).
- condor_userprio and the condor_negotiator daemon
did not correctly handle the names of submitters,
when these names exceeded 63 characters in length.
The fix handles submitter names of arbitrary length.
(Ticket #2445).
- Removed a spurious boolean flag reset in condor_q,
which resulted in an order dependency between the command line arguments
-long and -format.
(Ticket #2498).
- Fixed a bug in which a graceful shutdown of a condor_startd
did not correctly handle jobs using job deferral
which have landed on an execute machine but have not yet
reached their deferral time.
These jobs would appear to be running, despite the lack of
a condor_starter daemon.
These jobs now correctly transition to the idle state.
(Ticket #2486).
- Corrected a hierarchical group quota bug in which
the user accounting mechanism in the condor_negotiator daemon allowed
submitter records to be deleted,
if the submitter's priority factor was explicitly set and
the value was equal to that defined by DEFAULT_PRIO_FACTOR.
(Ticket #2442).
- Fixed CPU detection on Windows, such that the correct number of CPUs
is detected when there are more than 32 CPUs.
(Ticket #2381).
- Fixed a memory leak in the condor_negotiator,
caused by the failure to
free memory returned from some calls to param_without_default().
(Ticket #2299).
- Jobs run via glexec always had their PATH environment
variable cleared. Now, if PATH was specified for the job,
in any of the ways that job environment may be specified,
this setting is used.
(Ticket #2096).
- Fixed an infinite loop that could happen in Condor daemons
shortly after the receipt of a new connection.
This problem was introduced in Condor version 7.5.6.
(Ticket #2413).
- Fixed a problem in condor_hdfs that caused it to exit shortly
after starting up,
if the configuration variables
HDFS_DENY, HOSTDENY_WRITE, or HOSTDENY_READ
were not defined.
Previously, if HDFS_DENY was
not defined, HOSTDENY_WRITE and HOSTDENY_READ
were used to build the deny list.
Now DENY_WRITE and DENY_READ are also used.
(Ticket #2414).
- Removed an extra copy of the java files required to run gt4 and gt42
grid universe jobs. This does not affect Condor's operation.
(Ticket #2435).
- Fixed a problem that caused the condor_schedd to crash when
writing to some user logs with specific names. The specific names that
caused crashes are not easy to describe.
(Ticket #2439).
- Fixed a bug in which the condor_schedd failed to start up
when any job ClassAd attribute value contained the ASCII character 255.
(Ticket #2450).
- Fixed a bug in which condor_preen failed to honor the
-remove option, and would always remove lock files.
(Ticket #2497).
- condor_preen expected an old format for local lock file paths;
it now understands the proper format.
(Ticket #2496).
- condor_preen would EXCEPT when processing multiple
subdirectories for local locks.
(Ticket #2495).
- In 32-bit Condor binaries, the ImageSize of processes larger than
4 GBytes was reported as 4 GBytes. This limit has been raised to 4095 GBytes.
- vm universe jobs using Xen or KVM would fail to start,
if the disk image files were transferred from the submit machine
and the default value defined for LIBVIRT_XML_SCRIPT was used.
The script did not provide absolute path names for the files.
(Ticket #2511).
- Fixed a bug in which a completed Xen or KVM vm universe
job's modified disk image files would not be transferred back
to the submit machine.
(Ticket #2530).
- Fixed a bug in which a condor_starter configured to use job hooks
could fail to run a job,
but not wait for the job exit hook to complete before exiting.
The bug fix introduces the new configuration variable
<Keyword>_HOOK_JOB_EXIT_TIMEOUT ,
which defines the number of seconds the condor_starter will wait
before continuing with a shut down,
if the job exit hook has not completed.
(Ticket #2543).
- In Condor version 7.5.4, an improvement was made to avoid reliance on
the machine specified by NEGOTIATOR_HOST
matching a reverse DNS look up of the condor_negotiator.
However, this improvement was not made to the dedicated scheduler,
so matchmaking of parallel jobs was still subject to the
problems associated with the old algorithm.
Now, the dedicated scheduler benefits from the same improved algorithm as the
non-dedicated scheduler.
(Ticket #2540).
- Occasionally there have been problems with Windows
LOGON_NETWORK authentication,
leading to users being locked out from their account.
The new configuration variable SKIP_WINDOWS_LOGON_NETWORK,
when set to True,
fixes the problem by allowing this mechanism to be skipped entirely,
instead proceeding straight to the LOGON_INTERACTIVE authentication.
This bug only affected users using the condor_credd.
(Ticket #2549).
- Condor now correctly groups CREAM jobs based on how CREAM servers
authorize and map them.
The servers map them based on X.509 proxy subject name
and first VOMS attribute.
Previously, all VOMS attributes were considered.
This could cause unexpected behavior due to the aliasing of CREAM leases
and proxy delegations.
(Ticket #2271).
- Communication errors in the job lease renewal protocol were
sometimes not correctly handled. This resulted in the job being
killed.
(Ticket #2563).
Known Bugs:
Additions and Changes to the Manual:
- The manual now contains a manual page for condor_rmdir,
a Windows only replacement for the built-in Windows rmdir
introduced in Condor version 7.6.0.
Version 7.6.3
Release Notes:
- Condor version 7.6.3 released on August 23, 2011.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- Fixed a bug causing parallel universe jobs to be preempted upon
renewal of the job lease,
which by default happens within 20 minutes.
This meant that essentially no parallel universe job that took
longer than 20 minutes would ever finish.
(Ticket #2317).
- When the specified job requirements expression contained a
reference to RequestMemory, there was inconsistent behavior:
in some cases the default RequestMemory requirements were
suppressed, and in other cases not. Now, the default
RequestMemory requirements are always suppressed when there
are explicit references to RequestMemory in the job
requirements.
- Fixed a bug that could cause Condor to crash when using
the Local Credential Mapping Service (LCMAPS) with
GSI authentication.
(Ticket #2340).
- Fixed a bug that caused the condor_collector daemon to crash
upon receiving a CCB command,
when ENABLE_CCB_SERVER was changed from True to False
without restarting the daemon.
(Ticket #2357).
- The GT2 GAHP no longer consumes all of the CPU when compiled
with threaded Globus libraries.
(Ticket #2345).
- Fixed a problem introduced in Condor version 7.5.6,
which led to local lock files for user log locking always being created
whether or
not ENABLE_USERLOG_LOCKING was set to False.
(Ticket #2116).
- Installation as a service by the MSI installer on Windows platforms
now sets the default of Automatic Delayed.
(Ticket #2318).
- In PrivSep mode, if started as root,
the condor_master re-executes itself as the condor user.
Previously, supplementary groups were preserved.
Now supplementary groups are cleared and set to the list of groups
to which the condor user belongs.
(Ticket #2376).
- Fixed a bug where setting DAGMAN_PROHIBIT_MULTI_JOBS to
True caused SUBDAGs to stop working.
(Ticket #2331).
- Fixed a bug that caused scheduler universe jobs submitted via
Condor-C or condor_submit -spool to be held and be unable to run.
The hold reason given was File <filename> is missing or not executable.
(Ticket #2396).
- condor_submit now exits with an error,
if the command hold = True is in the submit description file
when using -spool or -remote as command-line arguments.
This combination of settings resulted in jobs being unable to run.
(Ticket #2398).
Known Bugs:
Additions and Changes to the Manual:
Version 7.6.2
Release Notes:
- Condor version 7.6.2 released on July 19, 2011.
New Features:
- Improved how condor_dagman deals with certain parse errors:
missing node name or submit description file in JOB lines.
Also, condor_dagman
now prints DAG input file lines as they are parsed,
if the debug verbosity setting is 6 or above,
as set with the condor_submit_dag command line option -debug.
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- Fixed a bug in the condor_negotiator that impacted the processing
of machine RANK such that condor_startd RANK
preemption only occurred if the preempting user had sufficient user priority
to claim another machine.
- condor_ssh_to_job did not work on systems using the
dash shell for /bin/sh.
- condor_ssh_to_job now works with jobs that are run via
glexec. Previously, it did not.
- When glexec was configured with linger=on,
the condor_starter would become unresponsive for the duration of the job.
For jobs longer than the value set by configuration variable
NOT_RESPONDING_TIMEOUT,
this caused the job to be aborted.
This also prevented job resource usage monitoring from working
while the job was running.
- Fixed a bug in hierarchical group quotas that caused
a warning to be logged, despite correct implementation.
- condor_preen now properly respects the convention that
the -debug option causes dprintf() logging to stderr.
- Fixed a problem introduced in Condor version 7.5.5
that could cause the condor_schedd to crash when a job was removed
during negotiation or when an idle parallel universe job left the queue.
- Fixed a problem that sometimes caused the condor_procd to die.
The chain of events for this fixed bug were that
the condor_startd killed the condor_starter due to unresponsiveness,
and the condor_procd would die.
Then condor_startd logged the message
ProcD has failed and the condor_startd exited.
- Fixed a problem introduced in Condor version 7.6.1
that caused the condor_shadow to crash without successfully putting the job
on hold when the user log could not be opened for writing.
- condor_history no longer crashes when given a constraint expression
longer than 512 characters.
- PBS and LSF grid jobs that arrive in a queue via Condor-C
or remote submission again work properly.
- Fix a bug that can cause the condor_gridmanager to crash
when a CREAM job ClassAd is missing the X509UserProxy attribute.
- Fix a bug that caused CREAM jobs to have incomplete input files,
if the condor_gridmanager crashed during transfer of those input files.
- Fix a bug in condor_submit that caused grid jobs intended for
CREAM services whose names contain extra dashes to become held.
- Fixed a bug in which condor_submit would try,
but fail to open the Deltacloud password file,
when the file name was dependent on an expression specified with $$().
- If the Owner attribute was not set in the ClassAd associated
with a cluster of jobs,
shared spooled executables were not correctly cleaned up.
- Fixed a bug for 64-bit versions of Windows in which the
user condor-reuse-slot<N> showed up on the login screen.
- Fixed a bug introduced in Condor version 7.5.5,
which changed the default value of the configuration variable
INVALID_LOG_FILES from the empty set to a file called core.
This resulted in core files being removed unexpectedly by condor_preen,
and that complicated debugging of Condor.
Previous behavior has been restored.
- Fixed a bug existing since Condor version 7.5.5 on Windows platforms.
The installer installed Java jar files in the correct
$(BIN)
directory,
while the value for the configuration variable
JAVA_CLASSPATH_DEFAULT utilized the obsolete $(LIB)
directory.
The installer now correctly sets JAVA_CLASSPATH_DEFAULT
to the $(BIN)
directory.
- Fixed a problem causing Condor to fail to start when
privsep was enabled and the environment had any variables
containing newlines.
Known Bugs:
Additions and Changes to the Manual:
Version 7.6.1
Release Notes:
- Condor version 7.6.1 released on June 3, 2011.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- A bug introduced in Condor version 7.5.5 caused the condor_schedd
to die when its attempt to claim a slot for a parallel universe job
was rejected by the condor_startd.
- condor_q -analyze failed to provide detailed analysis of
the job's requirements expression when the expression contained ClassAd
function calls in some cases.
- Fixed an incorrect exit code from condor_q
when invoked with the -name option and using Quill.
- Fixed a segmentation fault bug introduced in Condor version 7.5.5,
in the checkpoint and restart of jobs using compressed checkpoint images
under the standard universe.
By default, Condor will not compress checkpoints under the standard universe.
Jobs which do not compress their checkpoints were immune to this bug.
Compressed checkpoints are only available in 32-bit versions of Condor.
Generation of checkpoints in 64-bit versions of Condor are unaffected.
- In Condor version 7.6.0, the condor_schedd would create a
spool directory for every job. The corrected and previous behavior
has now been restored,
which is to create a spool directory only when needed.
- Fixed a bug introduced in Condor version 7.5.2,
that caused the condor_negotiator daemon to crash
if any machine ClassAds contained cyclical attribute references.
- Fixed a bug that caused usage by nice_user jobs to
be charged to the user directly rather than `nice-user.user'.
This bug was introduced in the 7.5 series.
- Fixed bugs in the RPM init script that could cause some
shutdown failures to be unreported,
and they could cause status requests,
such as service condor status,
to always report Condor as inactive.
- Fixed a bug in the condor_gridmanager that could cause a crash
when a grid type amazon job was missing required attributes.
- Fixed bug in the condor_shadow, in which it would treat
the closed socket to the execute machine as an error,
after both it had asked for the claim to be deactivated and the
condor_schedd daemon was busy.
As a result, a busy condor_schedd could result in the job being re-run.
- The matchmaking attributes
SubmitterUserResourcesInUse and RemoteUserResourcesInUse
no longer ignore SlotWeight, if used by the condor_negotiator.
- On Windows, the condor_kbdd daemon was missing changes to the
port on which the condor_startd was listening.
This resulted in failure of the condor_kbdd to send updates in
keyboard and mouse activity,
further causing the failure of policy implementation that relied upon
knowledge of the activity.
- Fixed a bug present throughout ClassAds,
in which expressions expecting a floating point value returned an error,
if the expression actually evaluated to a boolean.
This is most common in machine RANK expressions.
- Fixed a bug in the condor_negotiator daemon,
which caused a crash if the condor_negotiator was reconfigured
during a negotiation cycle,
but only if hierarchical group quotas were in use.
- Fixed a bug in which when submitting a job into the condor_schedd
remotely, or with spooling,
the file transfer plug-ins activated on the submit machine
and pulled down all the specified URLs in the transfer list
to the spool directory.
This behavior has been changed so that URLs are only downloaded
when the job is actually running with a condor_starter above it.
This is usually on an execute node, but could also be in the local universe.
- Removed the requirement that the Condor GAHP needs DAEMON-level
authorization access to the condor_gridmanager.
- On Windows platforms only,
fixed a bug that could cause a sporadic access violation
when a Condor daemon spawned another process.
- Fixed a bug that would cause the condor_startd to
incorrectly report Benchmarking as its activity, instead of Idle
when there was a problem launching the benchmarking programs.
- Fixed a bug in which the condor_startd can get stuck in a loop,
trying to execute an invalid, non-existent Daemon ClassAd Hook job.
- Fixed a bug in which the dedicated scheduler did not correctly
deactivate claims,
tending to result in jobs that appear to move back and forth between
the Idle and Running states,
with the condor_shadow daemon exiting each time with status 108.
Known Bugs:
Additions and Changes to the Manual:
Version 7.6.0
Release Notes:
- Condor version 7.6.0 released on April 19, 2011.
- Prior to Condor version 7.5.0, commenting out PREEN in the
default configuration file disabled condor_preen.
Starting in Condor version 7.5.0,
there was an internal default value for PREEN, so if
the configuration variable was not set in any configuration file,
condor_preen would still run.
To disable this functionality, PREEN can be explicitly set to
nothing.
New Features:
- Condor can now create and manage virtual machine instances in a
cloud service via Deltacloud. This is done via the new
deltacloud grid type in the grid universe.
See section 5.3.10 for details.
- Improved scalability of submission of cream grid type jobs.
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable DELTACLOUD_GAHP specifies
where the deltacloud_gahp binary is located. This binary is used to
manage deltacloud grid type jobs in the grid universe.
In a normal Condor installation, the value should be
$(SBIN)/deltacloud_gahp.
- Several new job ClassAd attributes have been added to support
the deltacloud grid type in the grid universe.
These attributes are taken from the submit description file:
DeltacloudUsername,
DeltacloudPasswordFile,
DeltacloudImageId,
DeltacloudRealmId,
DeltacloudHardwareProfile,
DeltacloudHardwareProfileCpu,
DeltacloudHardwareProfileMemory,
DeltacloudHardwareProfileStorage,
DeltacloudKeyname, and
DeltacloudUserData.
These attributes are set by Condor when the instance runs:
DeltacloudAvailableActions,
DeltacloudPrivateNetworkAddresses,
DeltacloudPublicNetworkAddresses.
See section 5.3.10 for details of running jobs under
Deltacloud, and see section 10
for definitions of these job ClassAd attributes.
- The configuration variable JAVA_MAXHEAP_ARGUMENT
has been removed.
This means that Java universe jobs will now run with the JVM's
default maximum heap setting,
unless specified otherwise by the administrator using the configuration
of JAVA_EXTRA_ARGUMENTS ,
or by the job via
java_vm_args in the submit description file
as described in section 2.8.
- The configuration variable TRUST_UID_DOMAIN
was set to True in the default condor_config.local
in the rpm and Debian packages. This is no longer the case.
This setting will therefore use the default value False.
- The configuration variable NEGOTIATOR_INTERVAL was set
to 20 in the default condor_config.local in the rpm and
Debian packages. This is no longer the case. This setting
therefore will use the default value 60.
Bugs Fixed:
- Fixed a bug in condor_dagman that caused it to fail when in recovery
mode in the face of a specific combination of node job failures with
retries.
- Fixed a bug that resulted in the spooled user log not being
fetched by condor_transfer_data. Prior to Condor version 7.5.4, this
problem affected spooled jobs submitted with an explicit list of
output files to transfer. In Condor version 7.5.4, this problem also
affected spooled jobs that auto-detected output files.
- When a job is submitted with the -spool option to condor_submit,
the condor_schedd now correctly writes the submit event to the user log
in its spool directory.
Previously, it would try to write the event using the user
log path given to condor_submit by the user,
which condor_submit may not have access to.
- Fixed a file descriptor leak in the condor_vm-gahp. The leak would
cause the daemon to fail if a VMware job ran for more than 16 hours on a
Linux machine.
- Fixed a bug in condor_dagman that caused it to treat any dollar
sign in the log file name of a node job's submit description file
as an illegal condor_dagman macro.
Now only the sequence of characters $( delimits a macro.
Known Bugs:
- There are two known issues related to the automatic creation
of checkpoints with the Condor checkpointing library in
Condor version 7.6.0.
The first is that compression of
standalone checkpoints is disabled for 32-bit binaries.
It was always disabled previously, for 64-bit binaries.
A standalone checkpoint is one that is run outside
of Condor's standard universe. The second problem has to do with compressed
32-bit checkpoint files within the standard universe.
If a user requests a compressed 32-bit checkpoint in the standard universe,
the resulting checkpoint will not be compressed.
As with standalone checkpoints, this has never been supported
in 64-bit binaries. We hope to fix both problems in
Condor version 7.6.1.
Additions and Changes to the Manual:
Next: 8.5 Development Release Series
Up: 8. Version History and
Previous: 8.3 Development Release Series
Contents
Index
condor-admin@cs.wisc.edu