Automatic Generation of Library Bindings Using Static Analysis

2009 Conference on Programming Language Design and Implementation (PLDI 2009)
Joint work with Tristan Ravitch, Eric Aderhold, and Ben Liblit

Abstract

High-level languages are growing in popularity. However, decades of C software development have produced large libraries of fast, time-tested, meritorious code that are impractical to recreate from scratch. Cross-language bindings can expose low-level C code to high-level languages. Unfortunately, writing bindings by hand is tedious and error-prone, while mainstream binding generators require extensive manual annotation or fail to offer the language features that users of modern languages have come to expect.

We present an improved binding-generation strategy based on static analysis of unannotated library source code. We characterize three high-level idioms that are not uniquely expressible in C’s low-level type system: array parameters, resource managers, and multiple return values. We describe a suite of interprocedural analyses that recover this high-level information, and we show how the results can be used in a binding generator for the Python programming language. In experiments with four large C libraries, we find that our approach avoids the mistakes characteristic of hand-written bindings while offering a level of Python integration unmatched by prior automated approaches. Among the thousands of functions in the public interfaces of these libraries, roughly 40% exhibit the behaviors detected by our static analyses.

The full paper is available as a PDF document. See also the suggested bibtex citation.

Note for Mac users: OS X’s built-in PDF viewer may not correctly display Figure 2 in the file linked above. An image of figure 2 is available here.


Back to Steve's front page.