A friend of mine was asking about how to write a small interpreter so
he can define new AI functions that his project can use at runtime. After
his face blanked over when I started explaining how to do it in
Common Lisp (really, there isn't that much explanation, it is like
a fundamental property of Common Lisp to do such things), he hastily
mentioned he wanted it in C.
Oh.
So, I hacked together a trivial demonstration program. This program
runs a very small interpreter which allows one to compile a C file into
a shared object, then load the shared object, and bind the functions
inside of the shared object to a structure full of function pointers
that you can then invoke manually. It is intended that one writes
their varied functions as different C files that they can load and
swap out at runtime.
Here is the lame makefile which compiles a program called
stuff.c.
# Makefile
stuff: stuff.c
gcc -Wall -g stuff.c -o stuff -ldl
clean:
rm -f stuff *.o *.so
Here is stuff.c. This program is set up using a traditional interpreter
design. However, it is totally barebones and I don't deal with
the interpreter environment in any meaningful way (other than its
reification and global nature) since you can't define new variables or
functions in the interpreter. Also, the lexical and parsing analysis
of the interpreted forms are horriffic at best. This is because doing
such things in C is a pain in the ass unless you use flex and bison
or are prepared to write a helluva lot more code. However, if I did
that, this wouldn't be the simple demonstration that it is.
Note that I chose to perform the linking to the loaded library
functions via an explicit indirection with the f
structure in the Env structure. I could have just taken
the func_name variable in eval_invoke() and simply performed a dlsym()
call upon it and called the resulting pointer with the arguments. If I
had done that, I could have called ANY function in the loaded library
(well, with the same protoype at any rate). It is generally more
general (in some respects) to do such a thing. However, I chose
the method I did because through the indirection I can associate
functions of different C linkage names to the symbols I use to identify
them--such as the different names of the default functions in relation
to the functions names as defined in the foo/bar.c codes.
A real world example of why the method I chose is useful would be
if I wanted to have multiple implementations of C functions with the
exact same name loaded at the same time where I could pick and choose
between them. In the method I chose, I could additionally associate
a namespace (or package name) with a shared object (meaning I'd pair
the f and lib_name fields into a
'Package' structure and have a hash table of them in the Env keyed by
package name that is specifed when loading the shared object) and use
another syntax in the interpreter to state which function I want to
call out of which namespace/package. This would be an exercise for
the reader to implement.
/* This is stuff.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dlfcn.h>
#define MATCH 0
#define TRUE 1
#define FALSE 0
#define DONE 0
#define NOT_DONE 1
#define BSIZE 1024
/* the type of the functions we care about in the compiled code */
typedef int (*FUNC)(int a, int b);
/* The extension of how to map function fromthe shared library to an API
to them have very obvious extensions that what I did here. I didn't
do any of them.
*/
typedef struct Funcs_t
{
FUNC fun1;
FUNC fun2;
FUNC fun3;
} Funcs;
/* The global environmental structure */
typedef struct Env_t
{
Funcs f;
void *lib_handle;
} Env;
/* The invocable functions in the global environment are defaulted to these
functions.
*/
int stub1(int a, int b)
{
printf("Default stub1(%d, %d): called.\n", a, b);
return a + b;
}
int stub2(int a, int b)
{
printf("Default stub2(%d, %d): called.\n", a, b);
return a + b;
}
int stub3(int a, int b)
{
printf("Default stub3(%d, %d): called.\n", a, b);
return a + b;
}
char *prompt_input(char *buf, int size, FILE *fin)
{
printf("> ");
fflush(NULL);
return fgets(buf, size, fin);
}
int eval_help(char *args, Env *e)
{
printf(
"Help:\n"
" help This help message\n"
" quit Quits the program\n"
" compile <file> Produces shared library of C file, don't provide the .c extension\n"
" load <file> Loads library named NAME by loading libNAME.so\n"
);
return NOT_DONE;
}
int eval_quit(char *args, Env *e)
{
printf("Quitting!\n");
return DONE;
}
/* compile a source file (without the .c extension) and create a shared
object we can load later. The error checking and reporting in this function
is criminally bad.
*/
int eval_compile(char *args, Env *e)
{
char cmd[BSIZE], file[BSIZE], buf[BSIZE];
int ret;
if (sscanf(args, "%s %s", cmd, file) != 2) {
printf("eval_compile: bad arity!\n");
return NOT_DONE;
}
/* construct and execute the compilaiton command. I hope everything
is in your path.
*/
sprintf(buf, "gcc -Wall -DPIC -fpic -c %s.c", file);
ret = system(buf);
if (ret != 0) {
printf("Sorry, an error happened during compilation.\n");
return NOT_DONE;
} else {
printf("Compile [%s.c]: OK\n", file);
}
/* Now produce the shared object */
sprintf(buf, "gcc -shared -Wl,-soname,lib%s.so.1 %s.o -lc -o lib%s.so",
file, file, file);
ret = system(buf);
if (ret != 0) {
printf("Sorry, an error happened during shared library generation.\n");
} else {
printf("Library generation [lib%s.so]: OK\n", file);
}
return NOT_DONE;
}
/* We only allow you to invoke the functions in the Env structure. You
denote the names by "fun1" "fun2" and "fun3". This is a bare skeleton
of how to do such things since I don't even create a symbol table for
the mapping of the interpreter function symbol to actual C functions.
*/
int eval_invoke(char *cmd, Env *e)
{
char buf[BSIZE], func_name[BSIZE];
int arg0, arg1;
int ret;
if (sscanf(cmd, "%s %s %d %d", buf, func_name, &arg0, &arg1) != 4) {
printf("eval_invoke: bad arity!\n");
return NOT_DONE;
}
/* now execute the function we wanted to run with the arguments. */
if (strncmp("fun1", func_name, 4) == MATCH) {
printf("[Invoking function fun1...]\n");
ret = (e->f.fun1)(arg0, arg1);
printf("[Result] %d\n", ret);
} else if (strncmp("fun2", func_name, 4) == MATCH) {
printf("[Invoking function fun2...]\n");
ret = (e->f.fun2)(arg0, arg1);
printf("[Result] %d\n", ret);
} else if (strncmp("fun3", func_name, 4) == MATCH) {
printf("[Invoking function fun3...]\n");
ret = (e->f.fun3)(arg0, arg1);
printf("[Result] %d\n", ret);
} else {
printf("I'm sorry, there is no function to invoke by that name.\n");
}
return NOT_DONE;
}
int eval_load(char *cmd, Env *e)
{
void *new_lib = NULL;
char buf[BSIZE], lib_name[BSIZE];
char name[BSIZE];
if (sscanf(cmd, "%s %s", buf, lib_name) != 2) {
printf("eval_load: bad arity!\n");
return NOT_DONE;
}
sprintf(name, "./lib%s.so", lib_name);
new_lib = dlopen(name, RTLD_NOW | RTLD_LOCAL);
if (new_lib == NULL) {
printf("Failed to load library: %s\n", name);
return NOT_DONE;
}
/* close any previous one */
if (e->lib_handle != NULL) {
dlclose(e->lib_handle);
}
/* keep a reference to the new one */
e->lib_handle = new_lib;
/* "link" the functions in the Env to the ones we just loaded */
e->f.fun1 = dlsym(e->lib_handle, "fun1");
if (e->f.fun1 == NULL) {
printf("Warning, unable to resolve fun1() from library %s, "
"assuming initial stub1().\n", name);
e->f.fun1 = stub1;
}
e->f.fun2 = dlsym(e->lib_handle, "fun2");
if (e->f.fun2 == NULL) {
printf("Warning, unable to resolve fun2() from library %s, "
"assuming initial stub2().\n", name);
e->f.fun2 = stub2;
}
e->f.fun3 = dlsym(e->lib_handle, "fun3");
if (e->f.fun3 == NULL) {
printf("Warning, unable to resolve fun3() from library %s, "
"assuming initial stub3().\n", name);
e->f.fun3 = stub3;
}
printf("Functions Linked!\n");
return NOT_DONE;
}
/* The basic structure of the interpreter */
int eval_command(char *cmd, Env *e)
{
printf("Evaluating command: '%s'\n", cmd);
/* check to see what I have and run the appropriate handler */
if (strncmp("help", cmd, 4) == MATCH) {
return eval_help(cmd, e);
}
if (strncmp("quit", cmd, 4) == MATCH) {
return eval_quit(cmd, e);
}
if (strncmp("compile", cmd, 7) == MATCH) {
return eval_compile(cmd, e);
}
if (strncmp("invoke", cmd, 6) == MATCH) {
return eval_invoke(cmd, e);
}
if (strncmp("load", cmd, 4) == MATCH) {
return eval_load(cmd, e);
}
printf("Sorry, I don't know how to do that command.\n");
return NOT_DONE;
}
int main(void)
{
char buf[BSIZE];
int done = NOT_DONE;
char *ret = NULL;
char *nl = NULL;
Env e;
/* set up defaults */
e.f.fun1 = stub1;
e.f.fun2 = stub2;
e.f.fun3 = stub3;
e.lib_handle = NULL;
/* run the read/eval/print loop until done */
printf("Welcome to a simple demonstration interpreter.\n");
eval_help(NULL, &e);
ret = prompt_input(buf, BSIZE, stdin);
while(ret != NULL && done == NOT_DONE)
{
/* I'm not doing any real whitespace trimming, so be VERY careful */
/* get rid of newline */
nl = strstr(buf, "\n");
if (nl != NULL) {
*nl = '\0';
}
done = eval_command(buf, &e);
if (done == NOT_DONE) {
ret = prompt_input(buf, BSIZE, stdin);
}
}
/* Clean up, if any */
if (e.lib_handle != NULL) {
dlclose(e.lib_handle);
e.lib_handle = NULL;
}
return 0;
}
Now, here is the first file that we'll be using as a replacement for the
stub functions. This file (and bar.c below) must be in the current working
directory when you start the stuff program.
/* This is foo.c */
#include <stdio.h>
#include <stdlib.h>
int fun1(int a, int b)
{
printf("This is foo.c:fun1()\n");
fflush(NULL);
return a + b;
}
int fun2(int a, int b)
{
printf("This is foo.c:fun2()\n");
fflush(NULL);
return a + b;
}
int fun3(int a, int b)
{
printf("This is foo.c:fun3()\n");
fflush(NULL);
return a + b;
}
And here is bar.c, another definition of the above functions.
/* This is bar.c */
#include <stdio.h>
#include <stdlib.h>
int fun1(int a, int b)
{
printf("This is bar.c:fun1()\n");
fflush(NULL);
return a + b;
}
int fun2(int a, int b)
{
printf("This is bar.c:fun2()\n");
fflush(NULL);
return a + b;
}
int fun3(int a, int b)
{
printf("This is bar.c:fun3()\n");
fflush(NULL);
return a + b;
}
Now that we have everything defined, here is an interaction with
the program. Notice the compilation of the above C files happens
by us asking to compile them in the interpreter. Also notice how the
output of the functions "fun1", "fun2", and "fun3" change away from
the default to what is defined in each separate C file.
Linux black > ./stuff
Welcome to a simple demonstration interpreter.
Help:
help This help message
quit Quits the program
compile <file> Produces shared library of C file, don't provide .c
load <file> Loads library named NAME by loading libNAME.so\n"
> invoke fun1 10 10
Evaluating command: 'invoke fun1 10 10'
[Invoking function fun1...]
Default stub1(10, 10): called.
[Result] 20
> invoke fun2 10 10
Evaluating command: 'invoke fun2 10 10'
[Invoking function fun2...]
Default stub2(10, 10): called.
[Result] 20
> invoke fun3 10 10
Evaluating command: 'invoke fun3 10 10'
[Invoking function fun3...]
Default stub3(10, 10): called.
[Result] 20
> compile foo
Evaluating command: 'compile foo'
Compile [foo.c]: OK
Library generation [libfoo.so]: OK
> compile bar
Evaluating command: 'compile bar'
Compile [bar.c]: OK
Library generation [libbar.so]: OK
> load foo
Evaluating command: 'load foo'
Functions Linked!
> invoke fun1 10 10
Evaluating command: 'invoke fun1 10 10'
[Invoking function fun1...]
This is foo.c:fun1()
[Result] 20
> invoke fun2 10 10
Evaluating command: 'invoke fun2 10 10'
[Invoking function fun2...]
This is foo.c:fun2()
[Result] 20
> invoke fun3 10 10
Evaluating command: 'invoke fun3 10 10'
[Invoking function fun3...]
This is foo.c:fun3()
[Result] 20
> load bar
Evaluating command: 'load bar'
Functions Linked!
> invoke fun1 10 10
Evaluating command: 'invoke fun1 10 10'
[Invoking function fun1...]
This is bar.c:fun1()
[Result] 20
> invoke fun2 10 10
Evaluating command: 'invoke fun2 10 10'
[Invoking function fun2...]
This is bar.c:fun2()
[Result] 20
> invoke fun3 10 10
Evaluating command: 'invoke fun3 10 10'
[Invoking function fun3...]
This is bar.c:fun3()
[Result] 20
> quit
Evaluating command: 'quit'
Quitting!
Enhancement of the interpreter would go in the direction of allowing
all of the functions in the shared object to be discovered and shoved
into a symbol table stored in the Env environment so they can be
called. In addition, the arguments of the functions would be more
flexibly defined so you can pass other data types to them or define
them to have different arities. There is definitely more that can
be done.
End of Line.