CL-MW: A Master/Slave Distributed Computing Library

Written by: Peter Keller (psilord at cs.wisc.edu)

Home Page Blog Home Lisp Software, Tutorials, Hacks

Description

CL-MW is a Common Lisp library to aid in the construction of master/slave applications.

Its features include:

Supported Platforms

Currently CL-MW operates with SBCL on Linux and uses IOLib as its networking library. Patches will be accepted for other lisp implementations and environments. However, I only have resources to test on SBCL on Linux.

Download

Live Sources. The HEAD is the development version. All official release tags are present.

git clone http://pages.cs.wisc.edu/~psilord/lisp-public/public-repos-lisp/cl-mw.git

CL-MW is Quicklisp installable! Just type:

* (ql:quickload "cl-mw")

to download, install, and load the system. Sometimes the Quicklisp repository might have CL-MW at a revision or two behind the official release just because Quicklisp is updated about once a month or so.

CL-MW is licensed under the Apache License, Version 2.0.

Stable Releases

Release Version Tag Tarball Manual Release Date
0.3 cl-mw-0.3.tar.gz HTML PDF Tue Mar 27 21:56:50 CDT 2012
0.2 cl-mw-0.2.tar.gz HTML PDF Sun Mar 13 00:45:20 CST 2011
0.1 cl-mw-0.1.tar.gz HTML PDF Wed Nov 3 00:36:26 CDT 2010

Example CL-MW Application

Here is an example hello world application. Suppose it is in a file called "hello-cl-mw.lisp".

;; Load the CL-MW library into the lisp image
(require :cl-mw)

(defpackage #:hello-cl-mw
  (:use #:cl #:alexandria #:cl-mw)
  (:export #:mw-master
           #:mw-slave
           ;; Re-export this so we can dump an executable.
           #:mw-dump-exec ))

(in-package :hello-cl-mw)

;; Define a simple slave algorithm...
(define-mw-algorithm hello (str)
  (concatenate 'string "Hello World: " str))

;; When mw is initialized and it knows this is the master, run this entry
;; point, which will manage the processing loop.
(define-mw-master (argv)
    (declare (ignorable argv))
    (unwind-protect
         (let ((num-tasks 10)
               (num-results 0))

           ;; queue up some tasks to do.
           (dotimes (x num-tasks)
             (let ((str (format nil "Task ~A" x)))
               (mw-funcall-hello (str))))

           ;; While we have some outstanding results to compute...
           (while (/= num-results num-tasks)

             ;; Drive the system until some results are produced or
             ;; sequential slaves acquired/disconnected.
             (mw-master-loop)

             ;; Process the results, if any
             (when-let ((results (mw-get-results)))
               (dolist (result results)
                 (let ((payload (mw-result-packet result)))
                   ;; Keep track of the results so we know when we're done.
                   (incf num-results)

                   (format t "Got result from slave: ~S~%" payload)))))

           ;; exit value for master algo
           0)

      ;; uw-p cleanup form.
      (format t "Master algo cleanup form.~%")))

;; The optional Slave Algorithm which doesn't need to be specified at all,
;; but is for this example program.
;;
;; When mw is initialized and it knows this is a slave, simply start up
;; the slave, connect to the server, and start processing tasks.
(define-mw-slave (argv)
    (declare (ignorable argv))
    (unwind-protect

         ;; This returns 0 as an exit value.
         (mw-slave-loop-simple)

      ;; cleanup form for slave
      (format t "Slave algo cleanup form.~%")))

Here is a sample creation of an application binary of the above code. We start SBCL without the debugger so if there is a problem in the program it will dump a stack trace and exit right away instead of waiting for user input.

> sbcl --disable-debugger
;; sbcl welcome message

* (load "hello-cl-mw")
;; lots of output loading libraries etc, etc, etc

T
* (use-package #:hello-cl-mw)

T
* (mw-dump-exec :exec-name "./hello-cl-mw")
######################################
# Processing loaded shared libraries #
######################################
Shared-library: /home/psilord/content/dotfiles/.cache/common-lisp/sbcl-1.0.43.82-linux-x86/home/psilord/content/code/lisp/clbuild/source/iolib/src/syscalls/libiolib-syscalls.so...dumping...fixating.
Shared-library: librt.so...looking up...found /usr/lib/librt.so...dumping...fixating.

########################################################
#  Please package these libraries with your executable #
########################################################
./librt.so
./libiolib-syscalls.so

####################################
# Writing Master/Slave executable #
####################################
[undoing binding stack and other enclosing state... done]
[saving current Lisp image into ./hello-cl-mw:
writing 3512 bytes from the read-only space at 0x01000000
writing 2256 bytes from the static space at 0x01100000
writing 40574976 bytes from the dynamic space at 0x09000000
done]

[lisp image exits]

At this point the binary which is the master and also the slave has been produced. Any shared libraries loaded by the application will have been dumped next to the application. This is handy when you need to move the binary and its required libraries to other machines for execution.

> ls 
total 80516
-rw------- 1 psilord psilord     2574 Nov  4 21:37 hello-cl-mw.lisp
-rwxr-xr-x 1 psilord psilord 41201692 Nov  4 21:37 hello-cl-mw*
-rw------- 1 psilord psilord     7273 Nov  4 21:37 libiolib-syscalls.so
-rw------- 1 psilord psilord    30684 Nov  4 21:37 librt.so

When we start the master process, we ask to drop a resource file which encapsulates how to connect to the master process. We could instead pass a lot of command line arguments specifying these things, but this is easier in practice. By default, the master process will send one task per assignment of work to a slave and the slave will send one response back per completion of a task. These are configurable to minimize the network cost back and forth cost, but we leave them at their defaults for this example.

When we start these executables, we see a lot of output lines with "[A]" in them. These lines are part of the audit trail of what the master or slave was doing. They can be redirected to an audit file, but in this example are left to intermingle with the output of the program.

> ./hello-cl-mw --mw-master --mw-resource-file master.rsc
11/04/2010 21:43:45 [A] VERSION CL-MW: Version 0.1
11/04/2010 21:43:45 [A] INIT MASTER "default-member-id"
11/04/2010 21:43:45 [A] MASTER READY 192.168.0.100:57305

In another window, we start the slave. It uses the specified resource file to learn where to connect to the master process.

> ./hello-cl-mw --mw-slave --mw-resource-file master.rsc

The master assigns work to the slave the master continues to send tasks until all of them are finished. At this point, here is the rest of the output of the master:

11/04/2010 21:43:59 [A] NEW-CLIENT -> 192.168.0.100:45146
11/04/2010 21:43:59 [A] SLAVE-0 192.168.0.100:45146 -> ["default-member-id"] :connecting [:unordered]
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 0"
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 1"
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 2"
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 3"
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 4"
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 5"
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 6"
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 7"
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 8"
11/04/2010 21:44:00 [A] SLAVE-0 <- 1 tasks
11/04/2010 21:44:00 [A] SLAVE-0 -> :busy
11/04/2010 21:44:00 [A] SLAVE-0 -> 1 results
11/04/2010 21:44:00 [A] SLAVE-0 -> :idle
Got result from slave: "Hello World: Task 9"
Master algo cleanup form.
11/04/2010 21:44:00 [A] SLAVE-0 <- TRY-SHUTDOWN
11/04/2010 21:44:00 [A] SLAVE-0 -> :shutting-down
11/04/2010 21:44:00 [A] SLAVE-0 -> :disconnected
11/04/2010 21:44:00 [A] EOF -> 192.168.0.100:45146
11/04/2010 21:44:01 [A] FINI SHUTDOWN "default-member-id"

And here is the output of the slave.

11/04/2010 21:43:59 [A] VERSION CL-MW: Version 0.1
11/04/2010 21:43:59 [A] INIT SLAVE "default-member-id"
11/04/2010 21:43:59 [A] MASTER <- CONNECTED TO 192.168.0.100:57305 FROM 192.168.0.100:45146
11/04/2010 21:44:00 [A] MASTER -> ID SLAVE-0
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> 1 tasks (1 grouping)
11/04/2010 21:44:00 [A] MASTER <- 1 results
11/04/2010 21:44:00 [A] MASTER -> SHUTDOWN
11/04/2010 21:44:00 [A] MASTER -> CEOF
11/04/2010 21:44:00 [A] EOF -> 192.168.0.100:57305
Slave algo cleanup form.
11/04/2010 21:44:01 [A] FINI MASTER-SHUTDOWN "default-member-id"

If you'd like to test the master and slave in a REPL, add these two functions to the hello-cl-mw.lisp file. These call the function mw-initialize (which is the main entry point to CL-MW) with an appropriate ARGV list for the master and the slave process. These connect to localhost, but you can supply a real hostname or ip address there if you desire. In these helper functions, we increase the grouping of the tasks to the slave and the results to the master in one network communication round trip to 10.

;; Used for REPL testing purposes to make things easier to type.
(defun mw-master ()
  (mw-initialize '("--mw-master" "--mw-slave-task-group" "10"
                   "--mw-master-host" "localhost" 
                   "--mw-slave-result-group" "10")))

(defun mw-slave (port)
  (mw-initialize `("--mw-slave" "--mw-master-host" "localhost"
                                "--mw-master-port"
                                ,(format nil "~D" port))))

After calling mw-master on one REPL, you'd call mw-slave in the other REPL passing in the port to which the master bound itself.

It is recommended that you have one unique running lisp image for each master and slave process when testing in the REPL.

May this software serve your needs.