Array Length Inference for C Library Bindings

This research was conducted by Alisa Maas, Henrique Nazaré, and Ben Liblit. The paper will apear in the 31st IEEE / ACM International Conference on Automated Software Engineering (ASE 2016).

This paper received an ACM SIGSOFT Distinguished Paper Award.

Abstract

Simultaneous use of multiple programming languages (polyglot programming) assists in creating efficient, coherent, modern programs in the face of legacy code. However, manually creating bindings to low-level languages like C is tedious and error-prone. We offer relief in the form of an automated suite of analyses, designed to enhance the quality of automatically produced bindings. These analyses recover high-level array length information that is missing from C's type system. We emit annotations in the style of GObject-Introspection, which produces bindings from annotations on function signatures. We annotate each array argument as terminated by a special sentinel value, fixed-length, or of length determined by another argument. These properties help produce more idiomatic, efficient bindings. We correctly annotate at least 70% of all arrays with these length types, and our results are comparable to those produced by human annotators, but take far less time to produce.

Full Paper

The full paper is available as a single PDF document. A suggested BibTeX citation record is also available.