As of January 31, 2005 this patch has been committed into CVS! The documentation that appears here has also been added. I will leave this page up for a while, but you should now see the current CVS code for the updated version and look at the CVS documentation for the latest updates.
What's the current status? Languages that have been updated and tested: Guile, Mzscheme, Chicken, Ocaml, Perl, Python, Ruby, Tcl, and Php4. My testing has been limited to the test-suite for all these languages except for guile, chicken, and mzscheme which I have tested a little more extensivly. I ran the test suite check without the patches, and then ran it with the patches and used diff to compare the output. The only differences were on line numbers of error messages. Languages which have been updated but not tested: Pike. The guile-gh changes are not included in this patch, since there are two possible ways to update the guile-gh module. See the swig-guile.html page for more information.
Why? Why are these changes needed?
loading just hugemod_a (a module with 6000 types) ------------------------------------------------ My code: real 0m2.226s user 0m1.928s sys 0m0.115s Current CVS: real 0m10.127s user 0m9.630s sys 0m0.159s loading both hugemod_a and hugemod_b (both modules have 6000 types) -------------------------------------------------------------------- My code: real 0m6.015s user 0m5.551s sys 0m0.208s Current CVS: real 0m52.981s user 0m51.393s sys 0m0.599s
So what has changed in the type system? The first minor change was to split up swig_type_info structure into two structures (swig_type_info and swig_cast_info: see below for more info). The second change is to store a pointer to a swig_type_info rather than just the type name string in the linked list of casts. First off, this makes the guile module a little faster, and second, the SWIG_TypeClientData() function is faster too. The third and final change was to add the idea of a module into the type system. Before, all the types were stored in one huge linked list. Now, another level is added, and the type system stores a linked list of modules, each of which stores an array of types associated with it. This allows for module removal and easier searching, along with being able to attach some language specific stuff to a module (used by my guile changes).
Unloading modules? Right now this patch does not have any support for unloading modules because it is a little tricky. If there are no type dependencies between modules, it is trivial to unload a module. But say we have something like this. Module A is loaded. Module B is loaded, and it uses some types from Module A. Now we want to unload Module A. One way is just to require that modules are unloaded in the reverse order they were loaded. But we can allow arbitrary unloading of modules: Module B will have to update to using some other swig_type_info structures because Module A's memory is going to be unloaded.
I made some modifications to the swig manual documentation, explaining the type system a little more. Here is the relevant section from the Typemaps chapter.
At a basic level, the type checker simply restores some type-safety to extension modules. However, the type checker is also responsible for making sure that wrapped C++ classes are handled correctly---especially when inheritance is used. This is especially important when an extension module makes use of multiple inheritance. For example:_108e688_p_Foo
When the class FooBar is organized in memory, it contains the contents of the classes Foo and Bar as well as its own data members. For example:class Foo { int x; }; class Bar { int y; }; class FooBar : public Foo, public Bar { int z; };
Because of the way that base class data is stacked together, the casting of a Foobar * to either of the base classes may change the actual value of the pointer. This means that it is generally not safe to represent pointers using a simple integer or a bare void *---type tags are needed to implement correct handling of pointer values (and to make adjustments when needed).FooBar --> | -----------| <-- Foo | int x | |------------| <-- Bar | int y | |------------| | int z | |------------|
In the wrapper code generated for each language, pointers are handled through the use of special type descriptors and conversion functions. For example, if you look at the wrapper code for Python, you will see code like this:
In this code, SWIGTYPE_p_Foo is the type descriptor that describes Foo *. The type descriptor is actually a pointer to a structure that contains information about the type name to use in the target language, a list of equivalent typenames (via typedef or inheritance), and pointer value handling information (if applicable). The SWIG_ConvertPtr() function is simply a utility function that takes a pointer object in the target language and a type-descriptor objects and uses this information to generate a C++ pointer. However, the exact name and calling conventions of the conversion function depends on the target language (see language specific chapters for details).if ((SWIG_ConvertPtr(obj0,(void **) &arg1, SWIGTYPE_p_Foo,1)) == -1) return NULL;
The actual type code is in common.swg, and gets inserted near the top of the generated swig wrapper file. The phrase "a type X that can cast into a type Y" means that given a type X, it can be converted into a type Y. In other words, X is a derived class of Y or X is a typedef of Y. The structure to store type information looks like this:
Each swig_type_info stores a linked list of types that it is equivalent to. Each entry in this doubly linked list stores a pointer back to another swig_type_info structure, along with a pointer to a conversion function. This conversion function is used to solve the above problem of the FooBar class, correctly returning a pointer to the type we want./* Structure to store information on one type */ typedef struct swig_type_info { const char *name; /* mangled name of this type */ const char *str; /* human readable name for this type */ swig_dycast_func dcast; /* dynamic cast function down a hierarchy */ struct swig_cast_info *cast; /* Linked list of types that can cast into this type */ void *clientdata; /* Language specific type data */ } swig_type_info; /* Structure to store a type and conversion function used for casting */ typedef struct swig_cast_info { swig_type_info *type; /* pointer to type that is equivalent to this type */ swig_converter_func converter; /* function to cast the void pointers */ struct swig_cast_info *next; /* pointer to next cast in linked list */ struct swig_cast_info *prev; /* pointer to the previous cast */ } swig_cast_info;
The basic problem we need to solve is verifying and building arguments passed to functions. So going back to the SWIG_ConvertPtr() function example from above, we are expecting a Foo * and need to check if obj0 is in fact a Foo *. From before, SWIGTYPE_p_Foo is just a pointer to the swig_type_info structure describing Foo *. So we loop though the linked list of swig_cast_info structures attached to SWIGTYPE_p_Foo. If we see that the type of obj0 is in the linked list, we pass the object through the associated conversion function and then return a positive. If we reach the end of the linked list without a match, then obj0 can not be converted to a Foo * and an error is generated.
Another issue needing to be addressed is sharing type information between multiple modules.
More explicitly, we need
to have ONE swig_type_info for each type. If two modules both use the type, the
second module loaded must lookup and use the swig_type_info structure from the module already loaded.
Because no dynamic memory is used and the circular dependencies of the
casting information, loading the type information is somewhat tricky, and not explained here.
A complete description is in the common.swg file (and near the top of any generated file).
Each module has one swig_module_info structure which looks like this:
Each module stores an array of pointers to swig_type_info structures and the number of types in this module. So when a second module is loaded, it finds the swig_module_info structure for the first module and searches the array of types. If any of its own types are in the first module and have already been loaded, it uses those swig_type_info structures rather than creating new ones. These swig_module_info structures are chained together in a circularly linked list./* Structure used to store module information * Each module generates one structure like this, and the runtime collects * all of these structures and stores them in a circularly linked list.*/ typedef struct swig_module_info { swig_type_info **types; /* Array of pointers to swig_type_info structures that are in this module */ int size; /* Number of types in this module */ struct swig_module_info *next; /* Pointer to next element in circularly linked list */ swig_type_info **type_initial; /* Array of initially generated type structures */ swig_cast_info **cast_initial; /* Array of initially generated casting structures */ void *clientdata; /* Language specific module data */ } swig_module_info;
The most critical part is the typemap is the use of the $1_descriptor special variable. When placed in a typemap, this is expanded into the SWIGTYPE_* type descriptor object above. As a general rule, you should always use $1_descriptor instead of trying to hard-code the type descriptor name directly.%typemap(in) Foo * { if ((SWIG_ConvertPtr($input, (void **) &$1, $1_descriptor)) == -1) return NULL; }
There is another reason why you should always use the $1_descriptor variable. When this special variable is expanded, SWIG marks the corresponding type as "in use." When type-tables and type information is emitted in the wrapper file, descriptor information is only generated for those datatypes that were actually used in the interface. This greatly reduces the size of the type tables and improves efficiency.
Occassionally, you might need to write a typemap that needs to convert pointers of other types. To handle this, a special macro substition $descriptor(type) can be used to generate the SWIG type descriptor name for any C datatype. For example:
The primary use of $descriptor(type) is when writing typemaps for container objects and other complex data structures. There are some restrictions on the argument---namely it must be a fully defined C datatype. It can not be any of the special typemap variables.%typemap(in) Foo * { if ((SWIG_ConvertPtr($input, (void **) &$1, $1_descriptor)) == -1) { Bar *temp; if ((SWIG_ConvertPtr($input), (void **) &temp, $descriptor(Bar *)) == -1) { return NULL; } $1 = (Foo *) temp; } }
In certain cases, SWIG may not generate type-descriptors like you expect. For example, if you are converting pointers in some non-standard way or working with an unusual combination of interface files and modules, you may find that SWIG omits information for a specific type descriptor. To fix this, you may need to use the %types directive. For example:
When %types is used, SWIG generates type-descriptor information even if those datatypes never appear elsewhere in the interface file.%types(int *, short *, long *, float *, double *);
A final problem related to the type-checker is the conversion of types in code that is external to the SWIG wrapper file. This situation is somewhat rare in practice, but occasionally a programmer may want to convert a typed pointer object into a C++ pointer somewhere else in their program. The only problem is that the SWIG type descriptor objects are only defined in the wrapper code and not normally accessible.
To correctly deal with this situation, the following technique can be used:
Further details about the run-time type checking can be found in the documentation for individual language modules. Reading the source code may also help. The file common.swg in the SWIG library contains all of the source code for type-checking. This code is also included in every generated wrapped file so you probably just look at the output of SWIG to get a better sense for how types are managed./* Some non-SWIG file */ /* External declarations */ extern void *SWIG_TypeQuery(const char *); extern int SWIG_ConvertPtr(PyObject *, void **ptr, void *descr); void foo(PyObject *o) { Foo *f; static void *descr = 0; if (!descr) { descr = SWIG_TypeQuery("Foo *"); /* Get the type descriptor structure for Foo */ assert(descr); } if ((SWIG_ConvertPtr(o,(void **) &f, descr) == -1)) { abort(); } ... }
I have added a description to swiginit.swg describing how types are loaded, and below is a copy of it.
* Type initialization: * This problem is tough by the requirement that no dynamic * memory is used. Also, since swig_type_info structures store pointers to * swig_cast_info structures and swig_cast_info structures store pointers back * to swig_type_info structures, we need some lookup code at initialization. * The idea is that swig generates all the structures that are needed. * The runtime then collects these partially filled structures. * The SWIG_InitializeModule function takes these initial arrays out of * swig_module, and does all the lookup, filling in the swig_module.types * array with the correct data and linking the correct swig_cast_info * structures together. * The generated swig_type_info structures are assigned staticly to an initial * array. We just loop though that array, and handle each type individually. * First we lookup if this type has been already loaded, and if so, use the * loaded structure instead of the generated one. Then we have to fill in the * cast linked list. The cast data is initially stored in something like a * two-dimensional array. Each row corresponds to a type (there are the same * number of rows as there are in the swig_type_initial array). Each entry in * a column is one of the swig_cast_info structures for that type. * The cast_initial array is actually an array of arrays, because each row has * a variable number of columns. So to actually build the cast linked list, * we find the array of casts associated with the type, and loop through it * adding the casts to the list. The one last trick we need to do is making * sure the type pointer in the swig_cast_info struct is correct. * First off, we lookup the cast->type name to see if it is already loaded. * There are three cases to handle: * 1) If the cast->type has already been loaded AND the type we are adding * casting info to has not been loaded (it is in this module), THEN we * replace the cast->type pointer with the type pointer that has already * been loaded. * 2) If BOTH types (the one we are adding casting info to, and the * cast->type) are loaded, THEN the cast info has already been loaded by * the previous module so we just ignore it. * 3) Finally, if cast->type has not already been loaded, then we add that * swig_cast_info to the linked list (because the cast->type) pointer will * be correct.
I have added a description of how to write runtime support for new language modueles in Doc/Devel/runtime.txt. Here is a copy of it
This file describes the necissary functions and interfaces a language module needs to implement to take advantage of the run time type system. I assume you have read the run-time section of the Typemaps chapter in the SWIG documentation. Last updated: January 23, 2005 The file we are concerned with here should be named langrun.swg. A good example of a simple file is the Lib/mzscheme/mzrun.swg file. First, a few requirements and notes: 1) Every function in this file should be declared static. 2) It should be inserted into the runtime section of the _wrap file from your config file. The Lib/swigrun.swg file should be included before this file. That is, you need to have %runtime "swigrun.swg" %runtime "langrun.swg" 3) You must also include the swiginit.swg file in the init section of the wrapper. That is, you should have %insert(init) "swiginit.swg" 4) From module.cxx, you need to call the SwigType_emit_type_table function, as well as register types with SwigType_remember or SwigType_remember_clientdata 5) By convention, all functions in this file are of the form SWIG_Language_Whatever, and #defines are used to rename SWIG API functions to these function names 6) You need to call void SWIG_InitializeModule(void *clientdata) from your init function. ------------------------------------------------------------------------------- Required Functions ------------------------------------------------------------------------------- swig_module_info *SWIG_GetModule(void *clientdata); void SWIG_SetModule(void *clientdata, swig_module_info *mod); The SetModule function should store the mod argument into some globally accessable variable in the target language. The action of these two functions is to provide a way for multiple modules to share information. The SetModule function should create a new global var named something like "swig_runtime_data_type_pointer" SWIG_RUNTIME_VERSION SWIG_TYPE_TABLE_NAME SWIG_RUNTIME_VERSION is currently defined as "2", and SWIG_TYPE_TABLE_NAME is defined by the -DSWIG_TYPE_TABLE=mytable option when compiling the wrapper. Alternativly, if the language supports modules, a module named "swig_runtime_data" SWIG_RUNTIME_VERSION can be created, and a global variable named "type_table" SWIG_TYPE_TABLE_NAME can be created inside it. The most common approach is to store the mod pointer in some global variable in the target language, but if the language provides an alternative place to store data (like the chicken module), then that is good too. The way the code is set up, SetModule should only be called when GetModule returns NULL, and if SetModule is called a second time, the behavior is undefined. Just make sure it doesn't crash in the random chance occurance that SetModule is called twice. There are two options here. 1) The perferred approach is for GetModule and SetModule to not require a clientdata pointer. If you can at all avoid it, please do so. Here, you would write swig_module_info *SWIG_Language_GetModule(); void SWIG_Language_SetModule(swig_module_info *mod); and then add #define SWIG_GetModule(clientdata) SWIG_Language_GetModule() #define SWIG_SetModule(cd, ptr) SWIG_Language_SetModule(ptr) You would then call SWIG_InitializeModule(0) 2) If GetModule and SetModule need to take a custom pointer (most notably an environment pointer, see tcl or mzscheme), then you should write swig_module_info *SWIG_Language_GetModule(void *clientdata) void SWIG_Langauge_SetModule(void *clientdata, swig_module_info *mod); and also define #define SWIG_GetModule(cd) SWIG_Langauge_GetModule(cd) #define SWIG_SetModule(cd, ptr) SWIG_Language_SetModule(cd, ptr) #define SWIG_MODULE_CLIENTDATA_TYPE Whatever SWIG_MODULE_CLIENTDATA_TYPE should be defined to whatever the type of clientdata is. You would then call SWIG_InitializeModule(clientdata), and clientdata would get passed to GetModule and SetModule. clientdata will not be stored and will only be referenced during the InitializeModule call. After InitializeModule returns, clientdata does not need to be valid any more. This method is not preferred, because it makes external access to the type system more complicated. See the Modules chapter of the documentation, and read the "External access to the run-time" section. Then take a look at Lib/runtime.swg. Anybody that calls SWIG_TypeQuery needs to pass along the clientdata pointer, and that is the reason for defining SWIG_MODULE_CLIENTDATA_TYPE. ------------------------------------------------------------------------------- Standard Functions ------------------------------------------------------------------------------- These functions are not required and their API is not formalized, but almost all language modules implement them for consistancy across languages. Throughout this discussion, I will use LangType to represent the underlying language type (C_word in chicken, Scheme_Object * in mzscheme, PyObject * in python, etc) LangObj SWIG_NewPointerObj(void *ptr, swig_type_info *type, int flags); Create and return a new pointer object that has both ptr and type. For almost all language modules, flags is used for ownership. If flags==1, then the created pointer should be registered to be garbage collected. int SWIG_ConvertPtr(LangType obj, void **result, swig_type_info *type, int flags); Convert a language wrapped pointer into a void *. The pointer is returned in result, and the function should return 0 on success, non-zero on error. A sample ConvertPtr is given here: swig_cast_info *cast; if () { cast = SWIG_TypeCheck( , type); cast = SWIG_TypeCheckStruct( , type); if (cast) { *result = SWIG_TypeCast(cast, ); return 0; } } return 1; Either TypeCheck or TypeCheckStruct can be called, depending on how the pointer is wrapped in langtype. If obj stores the void pointer and the type name, then the TypeCheck function should be used, while if obj stores the void pointer and a pointer to the swig_type_info structure, then the TypeCheckStruct function should be called. The TypeCheckStruct is slightly faster, since it does a pointer comparison instead of a strcmp. void *SWIG_MustGetPtr(LangType obj, swig_type_info *type, int flags, int argnum, const char *func_name) { void *result; if (SWIG_ConvertPtr(s, &result, type, flags)) { generate runtime type error ("Error in func_name, expected a" + type->str ? type->str : "void *" + "at argument number" + argnum); } return result; } This function is optional, and the number and type of parameters can be different, but is useful for typemap purposes: %typemap(in) SWIGTYPE *, SWIGTYPE &, SWIGTYPE [] { $1 = ($1_ltype)SWIG_MustGetPtr($input, $descriptor, 0, $argnum, FUNC_NAME); }