Subsections

Technical Specification


Command Line Arguments

These are the command line arguments the CL-MW library accepts. These command line arguments are stripped from the argv before the argv is handed to the master algorithm.

--mw-help
Emit the usage and exit.
--mw-version-string
Emit the version string and exit.
--mw-master
Run the executable in Master Mode. Required if --mw-slave is not set and must be first on the command line.
--mw-slave
Run the executable in Slave Mode. Required if --mw-master is not set and must be first on the command line.
--mw-master-host ip-address-or-hostname
When in Master Mode, it is the interface (either the hostname or the ip address) to which the master should bind and is emitted to the resource file if any such file is written. When in Slave Mode, it is the hostname, or ip address, to which the slave process should connect and get work.
--mw-master-port port
To which port should the slave connect for work.
--mw-max-write-buffer-size size-in-bytes
How big the network writing buffer should be before rejecting the write.
--mw-max-read-buffer-size size-in-bytes
How big the network reading buffer should be before rejecting the read.
--mw-client-timeout seconds
How many seconds should the master wait for a client to respond when the master is expecting a response.
--mw-audit-file filename
A file in which the audit trail of the process is stored.
--mw-resource-file filename
Describes the resources needed by the master for a higher level batch system to honor.
When in master mode this file contains information concerning: When in slave mode:
Determine the master-host, master-port, and member-id to which the slave should connect by reading it from the resource file. The ordering of this command line option in relation to --mw-master-host, --mw-master-port, and --mw-member-id is important. If --mw-master-host, --mw-master-port, and/or --mw-member-id are specified before this argument then the resource file will overwrite the command line specification, and vice versa. If the resource file does not exist, then this knowledge is ignored (but warned about) if --mw-master-host and --mw-master-port are present.
--mw-resource-file-update-interval seconds
How many seconds between updating the resource file with current information.
--mw-slave-task-group positive-integer
How many tasks are grouped into a network packet being sent to a slave process. If the packet is larger than the maximum size of the read buffer of the slave, the slave will abort the read. Defaults to 1.
--mw-slave-result-group positive-integer
How many completed results should be grouped into a network packet being sent from the slave to the master. If the packet is larger than the maximum size of the read buffer for the master, then the master will abort the connection to the slave. Defaults to 1.
--mw-member-id string
This is a token which must match between the slave and the master. It is used to insecurely identify a working group of masters and slave. In a harsh environment with many masters and slaves going up and down, this acts as a simple sanity check that the correct slaves are connected to the correct master process. Default is the string "default-member-id".
--mw-slave-executable path-to-executable
This specifies the absolute path to a slave executable. It is used when writing the resource file only.

The API

The CL-MW library is in the :CL-MW package and it is used by the application package built on top of CL-MW. The exported symbols in the :CL-MW package are:

The Task Algorithm

(define-mw-algorithm name (parameters*) &body body) Macro

Defines a task algorithm with name name. The arguments passed to this call are exactly those which were passed into the mw-funcall-name form for the task algorithm.

Limitation: The parameters list is restricted to being required parameters only.

Limitation: A task algorithm may not return multiple values or a function or closure. The latter restriction is due to the inability to serialize a closure from the slave to the master.

Task Computation Function Generated by define-mw-algorithm
(name parameters*) Function
This is the function which actually performs the work of the task algorithm. It accepts the parameters specified and returns the last expression in the body supplied to the task algorithm macro.

Task Submission Macro Generated by define-mw-algorithm
(mw-funcall-name (parameters) &key
sid tag do-it-anyway (retry t))
Macro
This is a destructuring macro which will insert a single new task of the task algorithm named by name into CL-MW. The parameters are in the same order as the parameter list for the defined task algorithm and are evaluated before being packed into the task structure. The other parameters describe a behavior which together constitute the task policy for a submitted task.

sid SLAVE-ID
Send the task to a specific slave denoted by SLAVE-ID. If NIL, this task is considered :unordered, otherwise it is a :ordered task.
tag FORM
A form which will appear unchanged in the result structure associated with the computed task. The default is NIL.
do-it-anyway [T or NIL]
If the task was a :ordered task and the slave disconnected, then should this task be moved into the :unordered group (yes if T), or become unrunnable (yes if NIL)? By default :ordered tasks become unrunnable if the associated slave is disconnected.
retry [T or NIL]
If an unordered task was assigned to a slave and the slave went away, then this controls if we should retry on a different slave or if the task becomes unrunnable. If this test passes then :do-it-anyway is consulted in the case of :ordered tasks.

Specific Target Number API Generated by define-mw-algorithm

(mw-set-target-number-for-name value) Function

Sets the target number for the task algorithm specific to name to value, which is clamped to zero or greater. This represents the maximum number of pending tasks for this task algorithm that the master algorithm would like to keep in memory at once. This target number is advisory and the master algorithm can insert more tasks than indicated by the target number. The default target number for any specific task algorithm is 0.

(mw-get-target-number-for-name) Function

Returns the target number for the number of desired tasks to keep in memory for the task algorithm specific to name.

(mw-pending-tasks-for-name) Function

Return how many tasks are in memory (and not running on any slaves) specific to the task algorithm name.

(mw-upto-target-number-name) Function

Returns the number of tasks the master algorithm would have to create in order to reach the desired target number for task algorithm name.

The General Target Number API
(mw-set-target-number value) Function
Sets the general target number for all tasks regardless of task algorithm. This is only advisory and more tasks could be created into CL-MW by the master algorithm.

(mw-get-target-number) Function

Return the current value of the general task target number. The default value for the general target number is 0.

(mw-pending-tasks) Function

Return how many tasks of any kind are waiting to be scheduled to slaves.

(mw-upto-target-number) Function

Return how many tasks of any kind should be created by the master in order to reach the general target number for all tasks.

The Master Algorithm
(define-mw-master (argv) &body body) Macro
Defines the master algorithm for the application of which there may only be one. When the master algorithm has finished computation, it must return an integer from 0 to 255 which will become the return code of the process. If this doesn't happen, the return integer will be 255.

Note: If no master algorithm is specified in a CL-MW application. An audit line will be emitted stating this fact and the master computation will shut down immediately. A return code of 255 will happen in this case.

Parameter argv will be the command line arguments passed to the executable or to mw-initialize with the CL-MW specific arguments stripped out.

(mw-master-loop &key (timeout .05)) Function

Enter the CL-MW system loop processing I/O and other library tasks until one or more of these events happen:

When one or more of these events happen the function will return the 4 values:

  1. Number of unrunnable tasks
  2. Number of ready results
  3. Number of newly connected and unprocessed ordered slaves
  4. Number of newly disconnected and unprocessed ordered slaves

Parameter timeout is a time unit in real seconds which should be waited in the Network IO multiplexing library before timing out due to inactivity. In the case of this function, it means we perform bookkeeping work inside of the CL-MW library and enter back into the loop if no meaningful events occurred. Setting this value too low will result in excessive CPU usage by the master process.

(mw-master-loop-iterate &key (timeout .05)) Function

Enter the CL-MW system loop processing a single pass of network I/O and other library tasks. After this call one or more of the same events as described in mw-master-loop may have happened.

Parameter timeout is a time unit in real seconds which should be waited in the Network IO multiplexing library before timing out due to inactivity. In the case of this function, it means we return the 4 values as described in mw-master-loop. Setting this value too low could result in excessive CPU utilization.

(mw-get-unrunnable-tasks) Function

Return all currently unrunnable task structures in a list or NIL if none.

(mw-get-results) Function

Return all currently finished result structures in a list or NIL if none.

(mw-get-connected-ordered-slaves) Function

If there are any connected ordered slaves ready for use, this will retrieve the list of slave ids or NIL if none. In practice each slave id is a string, but generally they are an opaque data structure used to uniquely identify a slave. You should use equal to check quality between slave ids.

(mw-get-disconnected-ordered-slaves) Function

If any ordered slaves have become disconnected, return a list of their slave ids. You may use equal to compare against other slave ids.

(mw-allocate-slaves &key (amount 1000) (kind :unordered)) Function

There are three kinds of groups for which slaves can be allocated: :ordered, :intermingle, :unordered. When a slave initially connects for work, it is placed into one of the three groups. The order of group fulfillment is :ordered, :intermingle, :unordered. If both :ordered and :intermingle are full, then any connecting slaves go over to the :unordered group. The total number of desired slaves for all groups is written into the resource file as the number of needed slaves. This function can cause slaves in the :unordered group to move to the groups desired.

It is valid for the :unordered group to contain more than the allocation for it. The default allocation for all groups is 0.

(mw-deallocate-slaves &key (amount 0) (kind :unordered)) Function

This does not stop any slaves from processing any tasks, but it does lower the number of slaves desired, clamped to zero, of any of the of group :unordered, :intermingle, or :ordered as specified. This relates to what is written in the resource file by the master process.

(mw-free-slave slave-id) Function

Move the slave specified by slave-id into the :unordered
group after it completes whatever tasks it may be running and adjust the desired slave amounts for the group the slave was in. This does not evict or otherwise stop currently allocated tasks from running on that slave. The slave's group is only changed once all of the tasks it is currently running are computed.

(mw-num-runnable-tasks) Function

Returns the number of runnable tasks which includes tasks that were sent out and currently executing on slaves.

(mw-num-unrunnable-tasks) Function

Returns the number of unrunnable tasks in waiting to be consumed out of CL-MW with mw-get-unrunnable-tasks.

The Slave Algorithm
(define-mw-slave (argv) &body body) Macro
Defines the slave algorithm for the application of which there may only be one. When the slave algorithm has finished computation, it must return an integer from 0 to 255 which will become the return code of the process. If this doesn't happen, the return integer will be 255.

Parameter argv will be the command line arguments passed to the executable or to mw-initialize with the CL-MW specific arguments stripped out.

Note: If no slave algorithm is specified in a CL-MW application, then the default slave algorithm defined in listing 5.1 is used. An audit line entry will occur stating that the CL-MW default slave algorithm is being used.

\begin{lisp}[label=default-slave-algorithm,caption=Default Slave Algorithm]
(define-mw-slave (argv)
(mw-slave-loop-simple))
\end{lisp}

(mw-slave-loop &key (timeout .05)) Function

Process all pending tasks and return control to the slave algorithm.

This function will return 6 values in this order:
master-disconnect
Did the master close the connection to the client (or under some conditions CL-MW wanted to immediately exit due to some problem in the environment). T if the master cut the connection or the library wanted to exit, NIL otherwise.
explicit-shutdown
Did the master send a shutdown command to the slave according to the master/slave protocol? T if it did and NIL if it didn't.
total-results-completed
The number of total results which have been completely processed by the slave.
num-tasks
A number which is how many tasks are yet to be processed.
num-results
The number of results that are currently waiting to be sent back. This is affected by the master process with the command line parameter --mw-slave-result-group.
result-grouping
The number of results which must be grouped together before being sent back (or if there are no more tasks to compute whatever results are pending to go back get sent back).

Parameter timeout is a time unit in real seconds which should be waited in the Network IO multiplexing library before timing out due to inactivity. In the case of this function, it means we perform bookkeeping work inside of the CL-MW library and then return into the slave algorithm. Setting this value too low will result in excessive CPU usage by the master process.

(mw-slave-loop-iterate &key (timeout .0001)) Function

Process a single pending task, inspect the network buffers for more work to do, and return control to the slave algorithm. This will generally be extremely slow and hence has a short timeout. It returns the same values as mw-slave-loop and there may or may not have been any new tasks sent by the master in that time.

Parameter timeout is a time unit in real seconds which should be waited in the Network IO multiplexing library before timing out due to inactivity.

(mw-slave-loop-simple &key (timeout .05)) Function

Process all pending tasks form the master and wait for more. Only return when the master says to shutdown or there was a bad error and return 0 or 255 respectively.

Parameter timeout is a time unit in real seconds which should be waited in the Network IO multiplexing library before timing out. In the case of this function, it means we perform bookkeeping work inside of the CL-MW library and begin waiting again for more tasks from the master, or a shutdown command. Setting this value too low will result in excessive CPU usage by the slave process.

The Task Structure
(mw-task-sid task-structure) Function
Returns the slave-id for which the task-structure was destined. If the task is :unordered, then NIL is returned.

(mw-task-tag task-structure) Function

Return the associated tag object for this task-structure, or NIL if not set.

(mw-task-packet task-structure) Function

Retrieve, as a list, the arguments specific to the algorithm for which this task-structure was created.

The Result Structure
(mw-result-algorithm result-structure) Function
Return an uppercase string which is the name of the task algorithm that produced this result-structure.

(mw-result-sid result-structure) Function

Return the slave id of the slave which produced this result-structure.

(mw-result-tag result-structure) Function

Retrieve the unmodified tag associated with the original task-structure for this result-structure.

(mw-result-compute-time result-structure) Function

Return the length of time in seconds which represents how long it took to compute this result-structure.

(mw-result-packet result-structure) Function

Retrieve the actual returned form of the task algorithm which produced this result-structure.

Miscellaneous API
(mw-initialize (argv
&key (system-argv sb-ext:*posix-argv*)))
Function
The entry point into CL-MW. The parameter argv is a list of strings which represent the argument list to the library. Anything not a CL-MW specific argument will be passed to the master algorithm or the slave algorithm in the same order as it was on the command line.

(mw-version-string) Function

Return a string which represents the version number for this library.

Note: The format and meaning of this string may change in the future.

(mw-zero-clamp value) Function

If the value is less than zero, then return 0, otherwise return the value.

(mw-dump-exec &key (exec-name "a.out")
ignore-libs remap-libs)
Function

Produce an executable named exec-name, which is a.out by default, and copy any shared libraries needed by the application into the current working directory.

Any shared libraries loaded in the lisp image which are already an absolute path will be copied verbatim to the current working directory. Any unqualified libraries will be transformed by an algorithm approximating the search algorithm of dlopen() into absolute paths and then copied to the current working directory. The dumped shared libraries must be shipped with the executable to the target machine.

The parameter ignore-libs is a list of strings where each string is an unqualified library name. These libraries will be ignored by mw-dump-exec. If this parameter is NIL, the default, then no libraries are ignored.

The parameter remap-libs is an association list of strings where the first string is an unqualified library name and the second an absolute path to a library that will be copied to the current working directory in place of what is found in the lisp image. If this parameter is NIL, the default, then no libraries are remapped.

This interface may change in the future.

Limitation: The dumped libraries must exist in the current working directory when the executable is run.

(while test-expr &body body) Macro

A ubiquitous macro which implements the usual ``while'' loop control flow.


Resource File

Each form in the resource file is a two item list where the first item is the attribute name as a keyword, and the second an arbitrary Lisp form whose schema depends upon the specific attribute. They take the form of:

(keyword form)

The current attributes for the resource file in this version of CL-MW are:

:computation-status
The value is either the keyword :in-progress or the keyword :finished. It represents if the master algorithm thinks the computation is finished or not. If a slave reads a resource file with :computation-status being :finished, it will exit immediately with a status of zero.

:timestamp
The value is an integer which represents the universal time when the file was written.

:member-id
The value is a string which must match in the master and slave.

:update-interval
The value is an integer which represents the number of seconds since the timestamp after which the resource file will be re-written. The default is 300 seconds.

:slaves-needed
This value represents the raw number of slaves the master algorithm has requested in order to complete its task.

:slave-executable
This value is a list where the first element is a string representing the full path to the executable which is the slave executable, and the second element is a list of strings representing full paths to any shared libraries that have to be moved along with the executable.

:slave-arguments
This value is a list of strings which are the command line arguments, in order, with which the slave is to be spawned.

An example file:

\begin{lisp}[caption=Contents of a sample \textit{resource file}\xspace ]
;; Sta...
...e'' ''--mw-master-host'' ''black''
''--mw-master-port'' ''47416''))
\end{lisp}

Peter Keller 2010-11-02