Overview

Last updated Monday, January 31, 2005.

There are actually two seperate projects here.

Changes to the core runtime swig type system in common.swg (described on this page)
Updating the guile and mzscheme modules with a whole slew of changes. swig-guile.html

As of January 31, 2005 this patch has been committed into CVS! The documentation that appears here has also been added. I will leave this page up for a while, but you should now see the current CVS code for the updated version and look at the CVS documentation for the latest updates.

What's the current status? Languages that have been updated and tested: Guile, Mzscheme, Chicken, Ocaml, Perl, Python, Ruby, Tcl, and Php4. My testing has been limited to the test-suite for all these languages except for guile, chicken, and mzscheme which I have tested a little more extensivly. I ran the test suite check without the patches, and then ran it with the patches and used diff to compare the output. The only differences were on line numbers of error messages. Languages which have been updated but not tested: Pike. The guile-gh changes are not included in this patch, since there are two possible ways to update the guile-gh module. See the swig-guile.html page for more information.

Why? Why are these changes needed?

Speed. My patches are significantly faster when using modules with a large number of types. Here are some numbers found using the module created by hugemod.pl and just using the time command to time the loading of the module. This is on an AMD Athlon XP 3000 with 1GB of ram running Debian GNU/Linux.

loading just hugemod_a (a module with 6000 types)
------------------------------------------------
My code:
real    0m2.226s
user    0m1.928s
sys     0m0.115s

Current CVS:
real    0m10.127s
user    0m9.630s
sys     0m0.159s

loading both hugemod_a and hugemod_b (both modules have 6000 types)
--------------------------------------------------------------------
My code:
real    0m6.015s
user    0m5.551s
sys     0m0.208s

Current CVS:
real    0m52.981s
user    0m51.393s
sys     0m0.599s

Unloading modules. With these changes, it possible to unload modules. Currently, I have not yet written this code, but see below for an idea on how it can work
All languages are able to use runtime.swg for external access to the type system. With current CVS, only python and perl can access it.
Makes it possible to work with multiple copies of the interpreter loading the same shared library file. See these messages on swig-dev msg1 msg2. I have not yet written this code, but it is a high priority.
Cleanup of the runtime code. The data structures better reflect the actual data represented, the functions in swigrun.swg no longer take the swig_type_list_handle as a parameter, and other cleanups.

So what has changed in the type system? The first minor change was to split up swig_type_info structure into two structures (swig_type_info and swig_cast_info: see below for more info). The second change is to store a pointer to a swig_type_info rather than just the type name string in the linked list of casts. First off, this makes the guile module a little faster, and second, the SWIG_TypeClientData() function is faster too. The third and final change was to add the idea of a module into the type system. Before, all the types were stored in one huge linked list. Now, another level is added, and the type system stores a linked list of modules, each of which stores an array of types associated with it. This allows for module removal and easier searching, along with being able to attach some language specific stuff to a module (used by my guile changes).

Unloading modules? Right now this patch does not have any support for unloading modules because it is a little tricky. If there are no type dependencies between modules, it is trivial to unload a module. But say we have something like this. Module A is loaded. Module B is loaded, and it uses some types from Module A. Now we want to unload Module A. One way is just to require that modules are unloaded in the reverse order they were loaded. But we can allow arbitrary unloading of modules: Module B will have to update to using some other swig_type_info structures because Module A's memory is going to be unloaded.

common-types-1.3.21.diff.gz: Against SWIG version 1.3.21 This version does not have any changes to chicken, due to Bug #782468. This bug has been fixed in CVS.
common-types.patch: Against CVS as of Jan 24, 2005 This patch has been committed into CVS, so just updating to the latest should work fine.
For a list of individual patches, go here.

Details

I made some modifications to the swig manual documentation, explaining the type system a little more. Here is the relevant section from the Typemaps chapter.

8.8 The run-time type checker

Most scripting languages need type information at run-time. This type information can include how to construct types, how to garbage collect types, and the inheritance relationships between types. If the language interface does not provide its own type information storage, the generated SWIG code needs to provide it.

Requirements for the type system:

Store inheritance and type equivalence information and be able to correctly re-create the type pointer.

Share type information between modules.

Modules can be loaded in any order, irregardless of actual type dependency.

Avoid the use of dynamically allocated memory, and library/system calls in general.

Provide a reasonably fast implementation, minimizing the lookup time for all language modules.

Custom, language specific information can be attached to types.

Modules can be unloaded from the type system.

8.8.1 Implementation

A critical part of SWIG's operation is that of its run-time type checker. When pointers, arrays, and objects are wrapped by SWIG, they are normally converted into typed pointer objects. For example, an instance of Foo * might be a string encoded like this:

_108e688_p_Foo

At a basic level, the type checker simply restores some type-safety to extension modules. However, the type checker is also responsible for making sure that wrapped C++ classes are handled correctly---especially when inheritance is used. This is especially important when an extension module makes use of multiple inheritance. For example:

class Foo {
   int x;
};

class Bar {
   int y;
};

class FooBar : public Foo, public Bar {
   int z;
};

When the class FooBar is organized in memory, it contains the contents of the classes Foo and Bar as well as its own data members. For example:

FooBar --> | -----------|  <-- Foo
           |   int x    |
           |------------|  <-- Bar
           |   int y    |
           |------------|
           |   int z    |
           |------------|

Because of the way that base class data is stacked together, the casting of a Foobar * to either of the base classes may change the actual value of the pointer. This means that it is generally not safe to represent pointers using a simple integer or a bare void *---type tags are needed to implement correct handling of pointer values (and to make adjustments when needed).

In the wrapper code generated for each language, pointers are handled through the use of special type descriptors and conversion functions. For example, if you look at the wrapper code for Python, you will see code like this:

if ((SWIG_ConvertPtr(obj0,(void **) &arg1, SWIGTYPE_p_Foo,1)) == -1) return NULL;

In this code, SWIGTYPE_p_Foo is the type descriptor that describes Foo *. The type descriptor is actually a pointer to a structure that contains information about the type name to use in the target language, a list of equivalent typenames (via typedef or inheritance), and pointer value handling information (if applicable). The SWIG_ConvertPtr() function is simply a utility function that takes a pointer object in the target language and a type-descriptor objects and uses this information to generate a C++ pointer. However, the exact name and calling conventions of the conversion function depends on the target language (see language specific chapters for details).

The actual type code is in common.swg, and gets inserted near the top of the generated swig wrapper file. The phrase "a type X that can cast into a type Y" means that given a type X, it can be converted into a type Y. In other words, X is a derived class of Y or X is a typedef of Y. The structure to store type information looks like this:

/* Structure to store information on one type */
typedef struct swig_type_info {
  const char *name;             /* mangled name of this type */
  const char *str;              /* human readable name for this type */
  swig_dycast_func dcast;       /* dynamic cast function down a hierarchy */
  struct swig_cast_info *cast;  /* Linked list of types that can cast into this type */
  void *clientdata;             /* Language specific type data */
} swig_type_info;

/* Structure to store a type and conversion function used for casting */
typedef struct swig_cast_info {
  swig_type_info *type;          /* pointer to type that is equivalent to this type */
  swig_converter_func converter; /* function to cast the void pointers */
  struct swig_cast_info *next;   /* pointer to next cast in linked list */
  struct swig_cast_info *prev;   /* pointer to the previous cast */
} swig_cast_info;

Each swig_type_info stores a linked list of types that it is equivalent to. Each entry in this doubly linked list stores a pointer back to another swig_type_info structure, along with a pointer to a conversion function. This conversion function is used to solve the above problem of the FooBar class, correctly returning a pointer to the type we want.

The basic problem we need to solve is verifying and building arguments passed to functions. So going back to the SWIG_ConvertPtr() function example from above, we are expecting a Foo * and need to check if obj0 is in fact a Foo *. From before, SWIGTYPE_p_Foo is just a pointer to the swig_type_info structure describing Foo *. So we loop though the linked list of swig_cast_info structures attached to SWIGTYPE_p_Foo. If we see that the type of obj0 is in the linked list, we pass the object through the associated conversion function and then return a positive. If we reach the end of the linked list without a match, then obj0 can not be converted to a Foo * and an error is generated.

Another issue needing to be addressed is sharing type information between multiple modules. More explicitly, we need to have ONE swig_type_info for each type. If two modules both use the type, the second module loaded must lookup and use the swig_type_info structure from the module already loaded. Because no dynamic memory is used and the circular dependencies of the casting information, loading the type information is somewhat tricky, and not explained here. A complete description is in the common.swg file (and near the top of any generated file).

Each module has one swig_module_info structure which looks like this:

/* Structure used to store module information
 * Each module generates one structure like this, and the runtime collects
 * all of these structures and stores them in a circularly linked list.*/
typedef struct swig_module_info {
  swig_type_info **types;         /* Array of pointers to swig_type_info structures that are in this module */
  int size;                       /* Number of types in this module */
  struct swig_module_info *next;  /* Pointer to next element in circularly linked list */
  swig_type_info **type_initial;  /* Array of initially generated type structures */
  swig_cast_info **cast_initial;  /* Array of initially generated casting structures */
  void *clientdata;               /* Language specific module data */
} swig_module_info;

Each module stores an array of pointers to swig_type_info structures and the number of types in this module. So when a second module is loaded, it finds the swig_module_info structure for the first module and searches the array of types. If any of its own types are in the first module and have already been loaded, it uses those swig_type_info structures rather than creating new ones. These swig_module_info structures are chained together in a circularly linked list.

8.8.2 Usage

When pointers are converted in a typemap, the typemap code often looks similar to this:

%typemap(in) Foo * {
  if ((SWIG_ConvertPtr($input, (void **) &$1, $1_descriptor)) == -1) return NULL;
}

The most critical part is the typemap is the use of the $1_descriptor special variable. When placed in a typemap, this is expanded into the SWIGTYPE_* type descriptor object above. As a general rule, you should always use $1_descriptor instead of trying to hard-code the type descriptor name directly.

There is another reason why you should always use the $1_descriptor variable. When this special variable is expanded, SWIG marks the corresponding type as "in use." When type-tables and type information is emitted in the wrapper file, descriptor information is only generated for those datatypes that were actually used in the interface. This greatly reduces the size of the type tables and improves efficiency.

Occassionally, you might need to write a typemap that needs to convert pointers of other types. To handle this, a special macro substition $descriptor(type) can be used to generate the SWIG type descriptor name for any C datatype. For example:

%typemap(in) Foo * {
  if ((SWIG_ConvertPtr($input, (void **) &$1, $1_descriptor)) == -1) {
     Bar *temp;
     if ((SWIG_ConvertPtr($input), (void **) &temp, $descriptor(Bar *)) == -1) {
         return NULL;
     }
     $1 = (Foo *) temp;
  }
}

The primary use of $descriptor(type) is when writing typemaps for container objects and other complex data structures. There are some restrictions on the argument---namely it must be a fully defined C datatype. It can not be any of the special typemap variables.

In certain cases, SWIG may not generate type-descriptors like you expect. For example, if you are converting pointers in some non-standard way or working with an unusual combination of interface files and modules, you may find that SWIG omits information for a specific type descriptor. To fix this, you may need to use the %types directive. For example:

%types(int *, short *, long *, float *, double *);

When %types is used, SWIG generates type-descriptor information even if those datatypes never appear elsewhere in the interface file.

A final problem related to the type-checker is the conversion of types in code that is external to the SWIG wrapper file. This situation is somewhat rare in practice, but occasionally a programmer may want to convert a typed pointer object into a C++ pointer somewhere else in their program. The only problem is that the SWIG type descriptor objects are only defined in the wrapper code and not normally accessible.

To correctly deal with this situation, the following technique can be used:


/* Some non-SWIG file */

/* External declarations */
extern void *SWIG_TypeQuery(const char *);
extern int   SWIG_ConvertPtr(PyObject *, void **ptr, void *descr);

void foo(PyObject *o) {
   Foo *f;
   static void *descr = 0;
   if (!descr) {
      descr = SWIG_TypeQuery("Foo *");    /* Get the type descriptor structure for Foo */
      assert(descr);
   }
   if ((SWIG_ConvertPtr(o,(void **) &f, descr) == -1)) {
       abort();
   }
   ...
}

Further details about the run-time type checking can be found in the documentation for individual language modules. Reading the source code may also help. The file common.swg in the SWIG library contains all of the source code for type-checking. This code is also included in every generated wrapped file so you probably just look at the output of SWIG to get a better sense for how types are managed.

Loading the type information

I have added a description to swiginit.swg describing how types are loaded, and below is a copy of it.

 * Type initialization:
 * This problem is tough by the requirement that no dynamic 
 * memory is used. Also, since swig_type_info structures store pointers to 
 * swig_cast_info structures and swig_cast_info structures store pointers back
 * to swig_type_info structures, we need some lookup code at initialization. 
 * The idea is that swig generates all the structures that are needed. 
 * The runtime then collects these partially filled structures. 
 * The SWIG_InitializeModule function takes these initial arrays out of 
 * swig_module, and does all the lookup, filling in the swig_module.types
 * array with the correct data and linking the correct swig_cast_info
 * structures together.

 * The generated swig_type_info structures are assigned staticly to an initial 
 * array. We just loop though that array, and handle each type individually.
 * First we lookup if this type has been already loaded, and if so, use the
 * loaded structure instead of the generated one. Then we have to fill in the
 * cast linked list. The cast data is initially stored in something like a
 * two-dimensional array. Each row corresponds to a type (there are the same
 * number of rows as there are in the swig_type_initial array). Each entry in
 * a column is one of the swig_cast_info structures for that type.
 * The cast_initial array is actually an array of arrays, because each row has
 * a variable number of columns. So to actually build the cast linked list,
 * we find the array of casts associated with the type, and loop through it 
 * adding the casts to the list. The one last trick we need to do is making
 * sure the type pointer in the swig_cast_info struct is correct.

 * First off, we lookup the cast->type name to see if it is already loaded. 
 * There are three cases to handle:
 *  1) If the cast->type has already been loaded AND the type we are adding
 *     casting info to has not been loaded (it is in this module), THEN we
 *     replace the cast->type pointer with the type pointer that has already
 *     been loaded.
 *  2) If BOTH types (the one we are adding casting info to, and the 
 *     cast->type) are loaded, THEN the cast info has already been loaded by
 *     the previous module so we just ignore it.
 *  3) Finally, if cast->type has not already been loaded, then we add that
 *     swig_cast_info to the linked list (because the cast->type) pointer will
 *     be correct.

Documentation about writing runtime support

I have added a description of how to write runtime support for new language modueles in Doc/Devel/runtime.txt. Here is a copy of it

This file describes the necissary functions and interfaces a language module
needs to implement to take advantage of the run time type system.  I assume you
have read the run-time section of the Typemaps chapter in the SWIG
documentation.

Last updated: January 23, 2005

The file we are concerned with here should be named langrun.swg.  A good example
of a simple file is the Lib/mzscheme/mzrun.swg file.  First, a few requirements
and notes:

1) Every function in this file should be declared static.  

2) It should be inserted into the runtime section of the _wrap file from your
config file.  The Lib/swigrun.swg file should be included before this file.
That is, you need to have
%runtime "swigrun.swg" 
%runtime "langrun.swg"

3) You must also include the swiginit.swg file in the init section of the
wrapper.  That is, you should have
%insert(init) "swiginit.swg"

4) From module.cxx, you need to call the SwigType_emit_type_table function, as
well as register types with SwigType_remember or SwigType_remember_clientdata

5) By convention, all functions in this file are of the form
SWIG_Language_Whatever, and #defines are used to rename SWIG API functions to
these function names

6) You need to call void SWIG_InitializeModule(void *clientdata) from your init
function.

-------------------------------------------------------------------------------
Required Functions
-------------------------------------------------------------------------------
swig_module_info *SWIG_GetModule(void *clientdata);
void SWIG_SetModule(void *clientdata, swig_module_info *mod);

The SetModule function should store the mod argument into some globally
accessable variable in the target language.  The action of these two functions
is to provide a way for multiple modules to share information.  The SetModule
function should create a new global var named something like
"swig_runtime_data_type_pointer" SWIG_RUNTIME_VERSION SWIG_TYPE_TABLE_NAME
SWIG_RUNTIME_VERSION is currently defined as "2", and SWIG_TYPE_TABLE_NAME is
defined by the -DSWIG_TYPE_TABLE=mytable option when compiling the wrapper.

Alternativly, if the language supports modules, a module named
"swig_runtime_data" SWIG_RUNTIME_VERSION can be created, and a global variable
named "type_table" SWIG_TYPE_TABLE_NAME can be created inside it.  The most
common approach is to store the mod pointer in some global variable in the
target language, but if the language provides an alternative place to store data
(like the chicken module), then that is good too.

The way the code is set up, SetModule should only be called when GetModule
returns NULL, and if SetModule is called a second time, the behavior is
undefined. Just make sure it doesn't crash in the random chance occurance that
SetModule is called twice.

There are two options here.  

1) The perferred approach is for GetModule and SetModule to not require a
clientdata pointer.  If you can at all avoid it, please do so.  Here, you would
write swig_module_info *SWIG_Language_GetModule(); 
void SWIG_Language_SetModule(swig_module_info *mod);
and then add
#define SWIG_GetModule(clientdata) SWIG_Language_GetModule()
#define SWIG_SetModule(cd, ptr) SWIG_Language_SetModule(ptr)
You would then call
SWIG_InitializeModule(0)

2) If GetModule and SetModule need to take a custom pointer (most notably an
environment pointer, see tcl or mzscheme), then you should write
swig_module_info *SWIG_Language_GetModule(void *clientdata)
void SWIG_Langauge_SetModule(void *clientdata, swig_module_info *mod);
and also define
#define SWIG_GetModule(cd) SWIG_Langauge_GetModule(cd)
#define SWIG_SetModule(cd, ptr) SWIG_Language_SetModule(cd, ptr)
#define SWIG_MODULE_CLIENTDATA_TYPE Whatever
SWIG_MODULE_CLIENTDATA_TYPE should be defined to whatever the type of
clientdata is.

You would then call SWIG_InitializeModule(clientdata), and clientdata would get
passed to GetModule and SetModule.  clientdata will not be stored and will only
be referenced during the InitializeModule call.  After InitializeModule returns,
clientdata does not need to be valid any more.

This method is not preferred, because it makes external access to the type
system more complicated.  See the Modules chapter of the documentation, and read
the "External access to the run-time" section.  Then take a look at
Lib/runtime.swg.  Anybody that calls SWIG_TypeQuery needs to pass along the
clientdata pointer, and that is the reason for defining
SWIG_MODULE_CLIENTDATA_TYPE.

-------------------------------------------------------------------------------
Standard Functions
-------------------------------------------------------------------------------
These functions are not required and their API is not formalized, but almost all
language modules implement them for consistancy across languages.  Throughout
this discussion, I will use LangType to represent the underlying language type
(C_word in chicken, Scheme_Object * in mzscheme, PyObject * in python, etc)



LangObj SWIG_NewPointerObj(void *ptr, swig_type_info *type, int flags);
Create and return a new pointer object that has both ptr and type.  For almost
all language modules, flags is used for ownership.  If flags==1, then the
created pointer should be registered to be garbage collected.



int SWIG_ConvertPtr(LangType obj, void **result, swig_type_info *type, int flags);
Convert a language wrapped pointer into a void *.  The pointer is returned in
result, and the function should return 0 on success, non-zero on error.
A sample ConvertPtr is given here:

  swig_cast_info *cast;

  if () {
    cast = SWIG_TypeCheck(, type);
    cast = SWIG_TypeCheckStruct(, type);
    if (cast) {
      *result = SWIG_TypeCast(cast, );
      return 0;
    }
  }
  return 1;

Either TypeCheck or TypeCheckStruct can be called, depending on how the pointer
is wrapped in langtype.  If obj stores the void pointer and the type name, then
the TypeCheck function should be used, while if obj stores the void pointer and
a pointer to the swig_type_info structure, then the TypeCheckStruct function
should be called.  The TypeCheckStruct is slightly faster, since it does a
pointer comparison instead of a strcmp.  



void *SWIG_MustGetPtr(LangType obj, swig_type_info *type, int flags,
                      int argnum, const char *func_name) {
 void *result;
  if (SWIG_ConvertPtr(s, &result, type, flags)) {
    generate runtime type error ("Error in func_name, expected a" +
                                 type->str ? type->str : "void *" + 
				 "at argument number" + argnum);
  }
  return result;
}
This function is optional, and the number and type of parameters can be
different, but is useful for typemap purposes:
%typemap(in) SWIGTYPE *, SWIGTYPE &, SWIGTYPE [] {
  $1 = ($1_ltype)SWIG_MustGetPtr($input, $descriptor, 0, $argnum, FUNC_NAME);
}