A newer version can be found here
This is just my working collection of notes for various magic numbers in the bits of software I deal with on a regular basis. These are mostly extracted from the source files for the software in question. As it was originally written to be a set of notes for myself, I can't make any promises that it's accurate or up to date.
JobUniverse in job ClassAds
0 | Min | A placeholder, not a universe |
1 | Standard | Single process relinked jobs |
2 | Pipe | A placeholder, no longer used |
3 | Linda | A placeholder, no longer used |
4 | PVM | Parallel Virtual Machine apps |
5 | Vanilla | Single process non-relinked jobs |
6 | PVMD | PVM daemon process |
7 | Scheduler | A job run under the schedd |
8 | MPI | Message Passing Interface jobs |
9 | Grid / Globus | Jobs managed by condor_gridmanager (V6.6: always Globus, V6.7: grid_type=gt2, gt3, gt5, condor, oracle, nordugrid...) |
10 | Java | Jobs for the Java Virtual Machine |
11 | Parallel | Generalized parallel jobs |
12 | Local | A job run under the schedd using a starter (advanced form of Scheduler). |
13 | Max | A placeholder, not a universe. |
JobStatus in job ClassAds
0 | Unexpanded | U |
1 | Idle | I |
2 | Running | R |
3 | Removed | X |
4 | Completed | C |
5 | Held | H |
6 | Submission_err | E |
JobNotification in job ClassAds
0 | Never |
1 | Always |
2 | Complete |
3 | Error |
(Source: h/exit.h)
Value | Name | Description |
---|---|---|
4 | JOB_EXCEPTION | The job exited with an exception |
44 | DPRINTF_ERROR | There is a fatal error with dprintf() |
100 | JOB_EXITED | The job exited (not killed) |
101 | JOB_CKPTED | The job was checkpointed |
102 | JOB_KILLED | The job was killed |
103 | JOB_COREDUMPED | The job was killed and a core file produced |
105 | JOB_NO_MEM | Not enough memory to start the shadow |
106 | JOB_SHADOW_USAGE | incorrect arguments to condor_shadow |
107 | JOB_NOT_CKPTED | The job was kicked off without a checkpoint |
107 | JOB_SHOULD_REQUEUE | (!) We define this to the same number, since we want the same behavior. However, "JOB_NOT_CKPTED" doesn't mean much if we're not a standard universe job. The effect of this exit code is that we want the job to be put back in the job queue and run again. |
108 | JOB_NOT_STARTED | Can't connect to startd or request refused |
109 | JOB_BAD_STATUS | Job status != RUNNING on startup |
110 | JOB_EXEC_FAILED | Exec failed for some reason other than ENOMEM |
111 | JOB_NO_CKPT_FILE | There is no checkpoint file (lost) |
112 | JOB_SHOULD_HOLD | The job should be put on hold |
113 | JOB_SHOULD_REMOVE | The job should be removed |
Submit | 0 |
Execute | 1 |
Executable error | 2 |
Checkpointed | 3 |
Job evicted | 4 |
Job terminated | 5 |
Image size | 6 |
Shadow exception | 7 |
Generic | 8 |
Job aborted | 9 |
Job suspended | 10 |
Job unsuspended | 11 |
Job held | 12 |
Job released | 13 |
Node execute | 14 |
Node terminated | 15 |
Post script terminated | 16 |
Globus submit | 17 |
Globus submit failed | 18 |
Globus resource up | 19 |
Globus resource down | 20 |
Remote error | 21 |
Shadow | Starter | Universe |
---|---|---|
jim | jim | PVM |
V6 | V5 | Standard |
V6.1 | V6.1 | Everything else |
From Globus 2.2.4, globus_gram_protocol-5.0/globus_gram_protocol_constants.h and globus_gram_protocol_error.c
Value | GLOBUS_GRAM_ PROTOCOL_ERROR_... | the job failed |
---|---|---|
0 | Success | |
1 | PARAMETER_NOT_SUPPORTED | one of the RSL parameters is not supported |
2 | INVALID_REQUEST | the RSL length is greater than the maximum allowed |
3 | NO_RESOURCES | an I/O operation failed |
4 | BAD_DIRECTORY | jobmanager unable to set default to the directory requested |
5 | EXECUTABLE_NOT_FOUND | the executable does not exist |
6 | INSUFFICIENT_FUNDS | of an unused INSUFFICIENT_FUNDS |
7 | AUTHORIZATION | authentication with the remote server failed |
8 | USER_CANCELLED | the user cancelled the job |
9 | SYSTEM_CANCELLED | the system cancelled the job |
10 | PROTOCOL_FAILED | data transfer to the server failed |
11 | STDIN_NOT_FOUND | the stdin file does not exist |
12 | CONNECTION_FAILED | the connection to the server failed (check host and port) |
13 | INVALID_MAXTIME | the provided RSL 'maxtime' value is not an integer |
14 | INVALID_COUNT | the provided RSL 'count' value is not an integer |
15 | NULL_SPECIFICATION_TREE | the job manager received an invalid RSL |
16 | JM_FAILED_ALLOW_ATTACH | the job manager failed in allowing others to make contact |
17 | JOB_EXECUTION_FAILED | the job failed when the job manager attempted to run it |
18 | INVALID_PARADYN | an invalid paradyn was specified |
19 | INVALID_JOBTYPE | the provided RSL 'jobtype' value is invalid |
20 | INVALID_GRAM_MYJOB | the provided RSL 'myjob' value is invalid |
21 | BAD_SCRIPT_ARG_FILE | the job manager failed to locate an internal script argument file |
22 | ARG_FILE_CREATION_FAILED | the job manager failed to create an internal script argument file |
23 | INVALID_JOBSTATE | the job manager detected an invalid job state |
24 | INVALID_SCRIPT_REPLY | the job manager detected an invalid script response |
25 | INVALID_SCRIPT_STATUS | the job manager detected an invalid script status |
26 | JOBTYPE_NOT_SUPPORTED | the provided RSL 'jobtype' value is not supported by this job manager |
27 | UNIMPLEMENTED | unused ERROR_UNIMPLEMENTED |
28 | TEMP_SCRIPT_FILE_FAILED | the job manager failed to create an internal script submission file |
29 | USER_PROXY_NOT_FOUND | the job manager cannot find the user proxy |
30 | OPENING_USER_PROXY | the job manager failed to open the user proxy |
31 | JOB_CANCEL_FAILED | the job manager failed to cancel the job as requested |
32 | MALLOC_FAILED | system memory allocation failed |
33 | DUCT_INIT_FAILED | the interprocess job communication initialization failed |
34 | DUCT_LSP_FAILED | the interprocess job communication setup failed |
35 | INVALID_HOST_COUNT | the provided RSL 'host count' value is invalid |
36 | UNSUPPORTED_PARAMETER | one of the provided RSL parameters is unsupported |
37 | INVALID_QUEUE | the provided RSL 'queue' parameter is invalid |
38 | INVALID_PROJECT | the provided RSL 'project' parameter is invalid |
39 | RSL_EVALUATION_FAILED | the provided RSL string includes variables that could not be identified |
40 | BAD_RSL_ENVIRONMENT | the provided RSL 'environment' parameter is invalid |
41 | DRYRUN | the provided RSL 'dryrun' parameter is invalid |
42 | ZERO_LENGTH_RSL | the provided RSL is invalid (an empty string) |
43 | STAGING_EXECUTABLE | the job manager failed to stage the executable |
44 | STAGING_STDIN | the job manager failed to stage the stdin file |
45 | INVALID_JOB_MANAGER_TYPE | the requested job manager type is invalid |
46 | BAD_ARGUMENTS | the provided RSL 'arguments' parameter is invalid |
47 | GATEKEEPER_MISCONFIGURED | the gatekeeper failed to run the job manager |
48 | BAD_RSL | the provided RSL could not be properly parsed |
49 | VERSION_MISMATCH | there is a version mismatch between GRAM components |
50 | RSL_ARGUMENTS | the provided RSL 'arguments' parameter is invalid |
51 | RSL_COUNT | the provided RSL 'count' parameter is invalid |
52 | RSL_DIRECTORY | the provided RSL 'directory' parameter is invalid |
53 | RSL_DRYRUN | the provided RSL 'dryrun' parameter is invalid |
54 | RSL_ENVIRONMENT | the provided RSL 'environment' parameter is invalid |
55 | RSL_EXECUTABLE | the provided RSL 'executable' parameter is invalid |
56 | RSL_HOST_COUNT | the provided RSL 'host_count' parameter is invalid |
57 | RSL_JOBTYPE | the provided RSL 'jobtype' parameter is invalid |
58 | RSL_MAXTIME | the provided RSL 'maxtime' parameter is invalid |
59 | RSL_MYJOB | the provided RSL 'myjob' parameter is invalid |
60 | RSL_PARADYN | the provided RSL 'paradyn' parameter is invalid |
61 | RSL_PROJECT | the provided RSL 'project' parameter is invalid |
62 | RSL_QUEUE | the provided RSL 'queue' parameter is invalid |
63 | RSL_STDERR | the provided RSL 'stderr' parameter is invalid |
64 | RSL_STDIN | the provided RSL 'stdin' parameter is invalid |
65 | RSL_STDOUT | the provided RSL 'stdout' parameter is invalid |
66 | OPENING_JOBMANAGER_SCRIPT | the job manager failed to locate an internal script |
67 | CREATING_PIPE | the job manager failed on the system call pipe() |
68 | FCNTL_FAILED | the job manager failed on the system call fcntl() |
69 | STDOUT_FILENAME_FAILED | the job manager failed to create the temporary stdout filename |
70 | STDERR_FILENAME_FAILED | the job manager failed to create the temporary stderr filename |
71 | FORKING_EXECUTABLE | the job manager failed on the system call fork() |
72 | EXECUTABLE_PERMISSIONS | the executable file permissions do not allow execution |
73 | OPENING_STDOUT | the job manager failed to open stdout |
74 | OPENING_STDERR | the job manager failed to open stderr |
75 | OPENING_CACHE_USER_PROXY | the cache file could not be opened in order to relocate the user proxy |
76 | OPENING_CACHE | cannot access cache files in ~/.globus/.gass_cache, check permissions, quota, and disk space |
77 | INSERTING_CLIENT_CONTACT | the job manager failed to insert the contact in the client contact list |
78 | CLIENT_CONTACT_NOT_FOUND | the contact was not found in the job manager's client contact list |
79 | CONTACTING_JOB_MANAGER | connecting to the job manager failed. Possible reasons: job terminated, invalid job contact, network problems, ... |
80 | INVALID_JOB_CONTACT | the syntax of the job contact is invalid |
81 | UNDEFINED_EXE | the executable parameter in the RSL is undefined |
82 | CONDOR_ARCH | the job manager service is misconfigured. condor arch undefined |
83 | CONDOR_OS | the job manager service is misconfigured. condor os undefined |
84 | RSL_MIN_MEMORY | the provided RSL 'min_memory' parameter is invalid |
85 | RSL_MAX_MEMORY | the provided RSL 'max_memory' parameter is invalid |
86 | INVALID_MIN_MEMORY | the RSL 'min_memory' value is not zero or greater |
87 | INVALID_MAX_MEMORY | the RSL 'max_memory' value is not zero or greater |
88 | HTTP_FRAME_FAILED | the creation of a HTTP message failed |
89 | HTTP_UNFRAME_FAILED | parsing incoming HTTP message failed |
90 | HTTP_PACK_FAILED | the packing of information into a HTTP message failed |
91 | HTTP_UNPACK_FAILED | an incoming HTTP message did not contain the expected information |
92 | INVALID_JOB_QUERY | the job manager does not support the service that the client requested |
93 | SERVICE_NOT_FOUND | the gatekeeper failed to find the requested service |
94 | JOB_QUERY_DENIAL | the jobmanager does not accept any new requests (shutting down) |
95 | CALLBACK_NOT_FOUND | the client failed to close the listener associated with the callback URL |
96 | BAD_GATEKEEPER_CONTACT | the gatekeeper contact cannot be parsed |
97 | POE_NOT_FOUND | the job manager could not find the 'poe' command |
98 | MPIRUN_NOT_FOUND | the job manager could not find the 'mpirun' command |
99 | RSL_START_TIME | the provided RSL 'start_time' parameter is invalid |
100 | RSL_RESERVATION_HANDLE | the provided RSL 'reservation_handle' parameter is invalid |
101 | RSL_MAX_WALL_TIME | the provided RSL 'max_wall_time' parameter is invalid |
102 | INVALID_MAX_WALL_TIME | the RSL 'max_wall_time' value is not zero or greater |
103 | RSL_MAX_CPU_TIME | the provided RSL 'max_cpu_time' parameter is invalid |
104 | INVALID_MAX_CPU_TIME | the RSL 'max_cpu_time' value is not zero or greater |
105 | JM_SCRIPT_NOT_FOUND | the job manager is misconfigured, a scheduler script is missing |
106 | JM_SCRIPT_PERMISSIONS | the job manager is misconfigured, a scheduler script has invalid permissions |
107 | SIGNALING_JOB | the job manager failed to signal the job |
108 | UNKNOWN_SIGNAL_TYPE | the job manager did not recognize/support the signal type |
109 | GETTING_JOBID | the job manager failed to get the job id from the local scheduler |
110 | WAITING_FOR_COMMIT | the job manager is waiting for a commit signal |
111 | COMMIT_TIMED_OUT | the job manager timed out while waiting for a commit signal |
112 | RSL_SAVE_STATE | the provided RSL 'save_state' parameter is invalid |
113 | RSL_RESTART | the provided RSL 'restart' parameter is invalid |
114 | RSL_TWO_PHASE_COMMIT | the provided RSL 'two_phase' parameter is invalid |
115 | INVALID_TWO_PHASE_COMMIT | the RSL 'two_phase' value is not zero or greater |
116 | RSL_STDOUT_POSITION | the provided RSL 'stdout_position' parameter is invalid |
117 | INVALID_STDOUT_POSITION | the RSL 'stdout_position' value is not zero or greater |
118 | RSL_STDERR_POSITION | the provided RSL 'stderr_position' parameter is invalid |
119 | INVALID_STDERR_POSITION | the RSL 'stderr_position' value is not zero or greater |
120 | RESTART_FAILED | the job manager restart attempt failed |
121 | NO_STATE_FILE | the job state file doesn't exist |
122 | READING_STATE_FILE | could not read the job state file |
123 | WRITING_STATE_FILE | could not write the job state file |
124 | OLD_JM_ALIVE | old job manager is still alive |
125 | TTL_EXPIRED | job manager state file TTL expired |
126 | SUBMIT_UNKNOWN | it is unknown if the job was submitted |
127 | RSL_REMOTE_IO_URL | the provided RSL 'remote_io_url' parameter is invalid |
128 | WRITING_REMOTE_IO_URL | could not write the remote io url file |
129 | STDIO_SIZE | the standard output/error size is different |
130 | JM_STOPPED | the job manager was sent a stop signal (job is still running) |
131 | USER_PROXY_EXPIRED | the user proxy expired (job is still running) |
132 | JOB_UNSUBMITTED | the job was not submitted by original jobmanager |
133 | INVALID_COMMIT | the job manager is not waiting for that commit signal |
134 | RSL_SCHEDULER_SPECIFIC | the provided RSL scheduler specific parameter is invalid |
135 | STAGE_IN_FAILED | the job manager could not stage in a file |
136 | INVALID_SCRATCH | the scratch directory could not be created |
137 | RSL_CACHE | the provided 'gass_cache' parameter is invalid |
138 | INVALID_SUBMIT_ATTRIBUTE | the RSL contains attributes which are not valid for job submission |
139 | INVALID_STDIO_UPDATE_ATTRIBUTE | the RSL contains attributes which are not valid for stdio update |
140 | INVALID_RESTART_ATTRIBUTE | the RSL contains attributes which are not valid for job restart |
141 | RSL_FILE_STAGE_IN | the provided RSL 'file_stage_in' parameter is invalid |
142 | RSL_FILE_STAGE_IN_SHARED | the provided RSL 'file_stage_in_shared' parameter is invalid |
143 | RSL_FILE_STAGE_OUT | the provided RSL 'file_stage_out' parameter is invalid |
144 | RSL_GASS_CACHE | the provided RSL 'gass_cache' parameter is invalid |
145 | RSL_FILE_CLEANUP | the provided RSL 'file_cleanup' parameter is invalid |
146 | RSL_SCRATCH | the provided RSL 'scratch_dir' parameter is invalid |
147 | INVALID_SCHEDULER_SPECIFIC | the provided scheduler-specific RSL parameter is invalid |
148 | UNDEFINED_ATTRIBUTE | a required RSL attribute was not defined in the RSL spec |
149 | INVALID_CACHE | the gass_cache attribute points to an invalid cache directory |
150 | INVALID_SAVE_STATE | the provided RSL 'save_state' parameter has an invalid value |
151 | OPENING_VALIDATION_FILE | the job manager could not open the RSL attribute validation file |
152 | READING_VALIDATION_FILE | the job manager could not read the RSL attribute validation file |
153 | RSL_PROXY_TIMEOUT | the provided RSL 'proxy_timeout' is invalid |
154 | INVALID_PROXY_TIMEOUT | the RSL 'proxy_timeout' value is not greater than zero |
155 | STAGE_OUT_FAILED | the job manager could not stage out a file |
156 | JOB_CONTACT_NOT_FOUND | the job contact string does not match any which the job manager is handling |
157 | DELEGATION_FAILED | proxy delegation failed |
158 | LOCKING_STATE_LOCK_FILE | the job manager could not lock the state lock file |
159 | INVALID_ATTR | an invalid globus_io_clientattr_t was used. |
160 | NULL_PARAMETER | an null parameter was passed to the gram library |
161 | STILL_STREAMING | the job manager is still streaming output |
162 | LAST |
Value | GLOBUS_GRAM_ PROTOCOL_ JOB_STATE_... |
Description |
---|---|---|
1 | PENDING | The job is waiting for resources to become available to run. |
2 | ACTIVE | The job has received resources and the application is executing. |
4 | FAILED | The job terminated before completion because an error, user-triggered cancel, or system-triggered cancel. |
8 | DONE | The job completed successfully |
16 | SUSPENDED | The job has been suspended. Resources which were allocated for this job may have been released due to some scheduler-specific reason. |
32 | UNSUBMITTED | The job has not been submitted to the scheduler yet, pending the reception of the GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_COMMIT_REQUEST signal from a client. |
64 | STAGE_IN | The job manager is staging in files to run the job. |
128 | STAGE_OUT | The job manager is staging out files generated by the job. |
0xFFFFF | ALL | A mask of all job states. |
Value | GLOBUS_GRAM_ PROTOCOL_ JOB_SIGNAL_... | Description |
---|---|---|
1 | CANCEL | Cancel a job |
2 | SUSPEND | Suspend a job |
3 | RESUME | Resume a previously suspended job |
4 | PRIORITY | Change the priority of a job |
5 | COMMIT_REQUEST | Signal the job manager to commence with a job submission if the job request was accompanied by the (two_state=yes) RSL attribute. |
6 | COMMIT_EXTEND | Signal the job manager to wait an additional number of seconds (specified by an integer value string as the signal's argument) before timing out a two-phase job commit. |
7 | STDIO_UPDATE | Signal the job manager to change the way it is currently handling standard output and/or standard error. The argument for this signal is an RSL containing new @a stdout, @a stderr, @a stdout_position, @a stderr_position, or @a remote_io_url relations. |
8 | STDIO_SIZE | Signal the job manager to verify that streamed I/O has been completely received. The argument to this signal contains the number of bytes of stdout and stderr received, seperated by a space. The reply to this signal will be a SUCCESS message if these matched the amount sent by the job manager. Otherwise, an error reply indicating GLOBUS_GRAM_PROTOCOL_ERROR_STDIO_SIZE is returned. If standard output and standard error are merged, only one number should be sent as an argument to this signal. An argument of -1 for either stream size indicates that the client is not interested in the size of that stream. |
9 | STOP_MANAGER | Signal the job manager to stop managing the current job and terminate. The job continues to run as normal. The job manager will send a state change callback with the job status being FAILED and the error GLOBUS_GRAM_PROTOCOL_ERROR_JM_STOPPED. |
10 | COMMIT_END | Signal the job manager to clean up after the completion of the job if the job RSL contained the (two-phase = yes) relation. |
condor negotiator | 9614 (obsolete, dynamic in 6.7.x) |
condor collector | 9618 |
GT2 gatekeeper | 2119 |
gridftp | 2811 |
GT4 web services | 8443 |