Automatic Generation of Library Bindings Using Static Analysis

This paper appeared in the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 2009.

Slides from the talk are available in PDF.

This reasearch was conducted by Tristan Ravitch, Steve Jackson, Eric Aderhold, and Ben Liblit.

The source code is available under the BSD license on github. Additionally, some binaries are available:

The i386 binaries are built on RHEL5 and should run in any reasonable environment. The x86_64 packages were built on RHEL6 and probably require glibc >= 2.12. They both depend on the LLVM 3.0 shared library.

Abstract

High-level languages are growing in popularity. However, decades of C software development have produced large libraries of fast, time-tested, meritorious code that are impractical to recreate from scratch. Cross-language bindings can expose low-level C code to high-level languages. Unfortunately, writing bindings by hand is tedious and error-prone, while mainstream binding generators require extensive manual annotation or fail to offer the language features that users of modern languages have come to expect.

We present an improved binding-generation strategy based on static analysis of unannotated library source code. We characterize three high-level idioms that are not uniquely expressible in C’s low-level type system: array parameters, resource managers, and multiple return values. We describe a suite of interprocedural analyses that recover this high-level information, and we show how the results can be used in a binding generator for the Python programming language. In experiments with four large C libraries, we find that our approach avoids the mistakes characteristic of hand-written bindings while offering a level of Python integration unmatched by prior automated approaches. Among the thousands of functions in the public interfaces of these libraries, roughly 40% exhibit the behaviors detected by our static analyses.

Paper

The full paper is available here in PDF and Postscript.