A CL-MW application uses the :CL-MW package and exists in its own arbitrarily named package determined by the application author. There exist three parts to a CL-MW application: one or more task algorithms, a single master algorithm, and a single slave algorithm.
The purpose of this minimal example is to show how to create a task algorithm, a master algorithm, and a slave algorithm. The master algorithm will create tasks and process the results from one or more slaves which connect to the master process. The task algorithm we describe simply concatenates the string arguments with another string and returns it. Both the master and the slave processes are assumed to be on the same machine with both binding to the localhost interface.
We start with the unsurprising ASDF file for the hello-world CL-MW application.
For this next listing we see that mw-master and mw-slave are functions which are used for testing or debugging in the REPL. Notice we re-export the :CL-MW package symbol mw-dump-exec from our application package which helps us easily save the lisp image into a binary at a later time.
For documentation purposes, we partition the main single file of the implementation into parts which contain the task algorithm, the master algorithm, and the slave algorithm.
The task algorithm accepts a regular Lisp string and also returns one.
The master algorithm creates 10 tasks into CL-MW and then continues
to call mw-master-loop until 10 results have been
processed. When mw-master-loop returns, one or
more of these CL-MW functions will return meaningful data,
depending upon the application: mw-get-unrunnable-tasks,
mw-get-results, mw-get-connected-ordered-slaves,
mw-get-disconnected-ordered-slaves.
The slave algorithm is very simple in our case. The function
mw-slave-loop-simple simply loops inside of CL-MW processing
tasks until the master tells the slave to shut down, at which point
mw-slave-loop-simple returns 0 and the slave exits with
that return code. We could have left off this definition of a slave
algorithm altogether and used the default slave algorithm in a CL-MW
application. We included it here as demonstration of how to write one.
We additionally specify two helper functions which are not part of CL-MW, nor technically the application, but allow us ease of debugging and testing the application in the REPL.
Now, let's run this example in the REPL so we can see how it works. First, we'll set up and run the master process. We call our master helper function to start the master process. We're packaging together 10 tasks to an idle slave and expecting 10 results back from any particular slave. Otherwise 1 task will be sent and 1 result sent back from the slave. Grouping the tasks or results together makes the network communication more efficient. The master is told to start up on the localhost interface. There is no method to start the master bound to all interfaces.
In the log output below, the member id token of the master and slave is ``default-member-id''. In normal use, this should probably be changed to be unique to the specific master/slave computation. Please see the section on command line arguments on page for how to do this.
> sbcl This is SBCL 1.0.39.16, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. * (require :cl-mw.examples.hello-world) [ Lots of output on compiling and loading libraries ] * (use-package :cl-mw.examples.hello-world) T * (mw-master) 07/20/2010 23:15:42 [A] INIT MASTER "default-member-id" 07/20/2010 23:15:42 [A] MASTER READY 127.0.0.1:52942
At this point, the master process has already created some hello world tasks and is waiting for some slaves to connect. The output lines with [A] in them are emitted as an audit trail by the CL-MW library. The ``default-member-id'' is the membership token of the master which the slave must match. Let's start up a slave with our helper slave initialization function and pass in the port number of the master process--because for this example the helper slave function assumes localhost.
> sbcl This is SBCL 1.0.39.16, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. * (require :cl-mw.examples.hello-world) [ Lots of output on compiling and loading libraries ] * (use-package :cl-mw.examples.hello-world) T * (mw-slave 52942) 07/20/2010 23:23:46 [A] INIT SLAVE "default-member-id" 07/20/2010 23:23:47 [A] MASTER <- CONNECTED TO 127.0.0.1:52942 \ FROM 127.0.0.1:47768 07/20/2010 23:23:47 [A] MASTER -> ID SLAVE-0 07/20/2010 23:23:47 [A] MASTER -> 10 tasks (10 grouping) 07/20/2010 23:23:47 [A] MASTER <- 10 results 07/20/2010 23:23:47 [A] MASTER -> SHUTDOWN Slave algo cleanup form. 07/20/2010 23:23:47 [A] FINI SHUTDOWN "default-member-id" 0 *
The last number is the return code of the slave function.
Meanwhile, let's see what the master emitted:
07/20/2010 23:23:47 [A] NEW-CLIENT -> 127.0.0.1:47768 07/20/2010 23:23:47 [A] SLAVE-0 127.0.0.1:47768 -> \ ["default-member-id"] \ :connecting [:unordered] 07/20/2010 23:23:47 [A] SLAVE-0 -> :idle 07/20/2010 23:23:47 [A] SLAVE-0 <- 10 tasks 07/20/2010 23:23:47 [A] SLAVE-0 -> :busy 07/20/2010 23:23:47 [A] SLAVE-0 -> 10 results 07/20/2010 23:23:47 [A] SLAVE-0 -> :idle Got result from slave: "Hello World: Task 0" Got result from slave: "Hello World: Task 1" Got result from slave: "Hello World: Task 2" Got result from slave: "Hello World: Task 3" Got result from slave: "Hello World: Task 4" Got result from slave: "Hello World: Task 5" Got result from slave: "Hello World: Task 6" Got result from slave: "Hello World: Task 7" Got result from slave: "Hello World: Task 8" Got result from slave: "Hello World: Task 9" Master algo cleanup form. 07/20/2010 23:23:47 [A] SLAVE-0 <- TRY-SHUTDOWN 07/20/2010 23:23:47 [A] SLAVE-0 -> :shutting-down 07/20/2010 23:23:47 [A] SLAVE-0 -> :disconnected 07/20/2010 23:23:47 [A] EOF -> 127.0.0.1:47768 07/20/2010 23:23:47 [A] FINI SHUTDOWN "default-member-id" 0 *
Note: The audit lines have been reformatted slightly to fit. They do not have the traditional shell line continuation characters in them.
We see that the master had packaged all ten tasks into one packet and sent it to the slave. After getting the results-also in one packet, back, it printed them out. At this point the results have equaled the tasks in the master algorithm and it returns. CL-MW enters the shutdown phase where it actively tried to shut off all known slaves and then exit with the return code the master algorithm generated. If a severe problem arose during shutdown, then the return code will be set to 255.
The :CL-MW package exports the function mw-dump-exec which saves the Lisp image as an executable to the current working directory. We recommend that this function be re-exported from the application package built on top of the :CL-MW package as shown previously in the ASDF file for this example. Exporting this function makes it trivial to produce an executable--one just require s the package, then use-package s it, and then calls mw-dump-exec to produce the binary.
mw-dump-exec simplifies collecting required libraries that may not be present on the slave system. mw-dump-exec will copy any currently loaded libraries with an absolute path into the current working directory. For libraries without any path, it will approximate the search algorithm used by dlopen() to find an absolute path for the library and then copy it to the current working directory. mw-dump-exec, with the :ignore-libs keyword argument, can be told to ignore specific libraries loaded by the lisp image. One would supply a list of strings representing unqualified library names to be ignored. Libraries can also be remapped, with the :remap-libs keyword argument, from their unqualified name to a specific path. An association list should be supplied with :remap-libs which maps unqualified library names to absolute paths. Ignoring a library overrides a remap of a library, and a remap of a library overrides the auto detection of the library's absolute path. mu-dump-exec will update the Lisp image to look for the dumped libraries in the path ./ when the saved executable is started.
How the lisp image is started before the executable is produced is important. We start SBCL up with the --disable-debugger option which tells SBCL to dump a stack trace and exit when something has gone wrong in the executable--such as the signaling of an unhandled condition. Otherwise, SBCL will drop into an interactive debugging session and wait for input to arrive. Disabling the debugger prevents the executable from having a problem and then consuming valuable compute time on a resource waiting for input which will never come.
Dropping into the debugger is one of a few things in the execution environment that can be altered with various command line options to SBCL. Another common adjustment to set is how big the heap is in the Lisp image runtime. The default runtime heap size is operating system specific. On the Linux machine upon which I developed CL-MW, it was 512MB and so for each invocation of the master and slave executable, about 512MB of memory will be requested from the operating system--even if it isn't all used by the application. Depending upon your master algorithm and task algorithms, you may need to tune the runtime heap size to fit the computation requirements. Please see the SBCL manual for more tunable options as needed by your computation.
> sbcl --disable-debugger This is SBCL 1.0.39.16, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. * (require :cl-mw.examples.hello-world) [ Lots of output on compiling and loading libraries ] * (use-package :cl-mw.examples.hello-world) T * (mw-dump-exec) ###################################### # Processing loaded shared libraries # ###################################### Shared-library: /home/psilord/content/code/lisp/clbuild/source\ /iolib/src/syscalls/libiolib-syscalls.so...\ dumping...fixating. Shared-library: librt.so...looking up...found \ /usr/lib/librt.so...dumping...fixating. ######################################################## # Please package these libraries with your executable # ######################################################## ./librt.so ./libiolib-syscalls.so #################################### # Writing Master/Slave executable # #################################### [undoing binding stack and other enclosing state... done] [saving current Lisp image into ./a.out: writing 3512 bytes from the read-only space at 0x01000000 writing 2256 bytes from the static space at 0x01100000 writing 38322176 bytes from the dynamic space at 0x09000000 done]
mw-dump-exec iterated over all of the shared libraries being used by the Lisp image. mw-dump-exec determined that the shared library libio-syscalls.so used by IOLib (a package needed by CL-MW) must be included and the actual library file is copied into the current working directory. Then the Lisp image is adjusted to look in the path ./ for libio-syscalls.so. mw-dump-exec noticed that librt.so didn't have an absolute path but had a successful search for an absolute path to the library. This library is also copied to the current working directory and the lisp image adjusted to find it. If mw-dump-exec isn't told otherwise, the name of the binary it dumps is a.out. You can supply a different executable name to mw-dump-exec, see page for details.
If this executable is supplied with the same arguments to mw-initialize as defined in the helper function mw-master and another invocation started with the same arguments to mw-initialize as defined in the helper function mw-slave (adjusting for the master's host and port!), then you will see similar output as the slave executes the master's tasks.
Here we see the executable and the shared libraries with which it should be bundled when moved to another machine for execution:
> ls a.out *.so -rwxr-xr-x 1 psilord psilord 38948892 Jul 20 23:44 a.out* -rw------- 1 psilord psilord 7235 Jul 20 23:44 libiolib-syscalls.so -rw------- 1 psilord psilord 30684 Jul 20 23:44 librt.so
Important: Any dumped shared libraries must exist in the current working directory when the main binary is invoked for them to be found by the restarting binary. Relative paths and the environment variable LD_LIBRARY_PATH do not work properly.
The master and slave process both can write their audit trail to a specified file. This is done with the --mw-audit-file file-name command line option. When this option is used, every written line above with an [A] in it will be written to the specified audit file. Any other output that the master algorithm or slave algorithm creates will go to *standard-output* or to wherever that is bound.
Limitation: The audit files do not rotate and can grow unboundedly. The audit file will be appended to if it exists upon start of the master or slave process.
CL-MW does minimal statistics bookkeeping. The audit files can be used to answer questions about the application's run. For example, how many slaves are connected, what is the slave churn rate, on what subnet are the slaves, or how many tasks were processed for a given time interval.
Note: The format of the audit file may change in a future revision of CL-MW.
Peter Keller 2012-03-27