Modeling and Analyzing the Interaction of C and C++ Strings
Gogul Balakrishnan and Naoto Maeda and Sriram Sankaranarayanan and Franjo Ivancic and Aarti Gupta and Rakesh Pothengil
Strings are commonly used in a large variety of software. And yet, they are a common source of bugs involving invalid memory accesses arising due to the misuse of the string manipulation API. Such bugs are often remotely exploitable, leading to severe consequences. Therefore, static detection of invalid memory accesses due to string operations has received much attention, especially for C programs that use the standard C library functions. More recently, software is increasingly being developed in object-oriented languages such as C++ and Java. However, the need to interact with legacy C code and C-based system-level APIs often necessitates the use of a mixed programming paradigm that combines features of high-level object-oriented constructs with calls to standard C library functions. While such programs are commonplace, there has been little research on static analysis of such programs. In this paper, we present memory models for C++ programs that are heap-aware, with an emphasis on modeling dynamically allocated memory, use of null-terminated string buffers, C++ Standard Template Library (STL) classes, and the interactions among these features. We use standard verification techinques such as abstract interpretation and model checking to verify properties over these models to find potential bugs. Our tool can find \emph{several previously unknown bugs} in open-source projects. These bugs are primarily due to the subtle interactions of the intricate C++ programming model with legacy C string API.
Paper available as: [PDF] [Official Version]