Extracting Output Formats from Executables

This research was conducted by Junghee Lim, Thomas Reps, and Ben Liblit. The paper has been published in the 13th Working Conference on Reverse Engineering (WCRE 2006).

Abstract

We describe the design and implementation of FFE/x86 (File-Format Extractor for x86), an analysis tool that works on stripped executables (i.e., neither source code nor debugging information need be available) and extracts output data formats, such as file formats and network packet formats. We first construct a Hierarchical Finite State Machine (HFSM) that over-approximates the output data format. An HFSM defines a language over the operations used to generate output data. We use Value-Set Analysis (VSA) and Aggregate Structure Identification (ASI) to annotate HFSMs with information that partially characterizes some of the output data values. VSA determines an over-approximation of the set of addresses and integer values that each data object can hold at each program point, and ASI analyzes memory accesses in the program to recover information about the structure of aggregates. A series of filtering operations is performed to over-approximate an HFSM with a finite-state machine, which can result in a final answer that is easier to understand. Our experiments with FFE/x86 uncovered a possible bug in the image-conversion utility png2ico.

Full Paper

The full paper is available as a single PDF document. A suggested BibTeX citation record is also available.

Presentation Slides

This work was presented at WCRE 2003 in Benevento, Italy. Slides from that talk are available as a single PowerPoint document.