A Semantics-Based Approach to Malware Detection

Mila Dalla Preda, Mihai Christodorescu, Somesh Jha, Saumya Debray

January 17-19, 2007
Hide the Contact Info
Photo of Mihai Christodorescu
Mihai Christodorescu
Doctoral Candidate
1210 W Dayton St
Office 7372
Madison, WI 53706-1685
Curriculum vitæ: online PDF US letter (or A4)
[an error occurred while processing this directive]
Telephone: +1 608 262-6625
Fax: +1 608 262-9777
Website: http://www.cs.wisc.edu/~mihai/
Email: mihai@cs.wisc.edu
ICQ: 3954659
AIM: yodMihai
Yahoo! IM: warkda
Skype: warkdarrior
LinkedIn: view my profile
Google Chat/XMPP: mihaic@gmail.com

This paper is a result of research work on behavior-based malware detection and appeared in the Proceedings of the 34th Annual Symposium on Principles of Programming Languages (POPL 2007), January 17-19, 2007, Nice, France.

The work of Mila Dalla Preda was partially supported by the MUR project “InterAbstract” and by the FIRB project “Abstract Interpretation and Model Checking for the verification of embedded systems.”

The work of Mihai Christodorescu and Somesh Jha was supported in part by the National Science Foundation under grants CNS-0448476 and CNS-0627501.

The work of Saumya Debray was supported in part by the National Science Foundation under grants EIA-0080123, CCR-0113633, and CNS-0410918.

The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the above government agencies or the U.S. Government.



Malware detection is a crucial aspect of software security. Current malware detectors work by checking for “signatures,” which attempt to capture (syntactic) characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes such detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter syntactic properties of the malware byte sequence without significantly affecting their execution behavior.

This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behaviors of malware as well as the program being checked for infection, and uses abstract interpretation to “hide” irrelevant aspects of these behaviors. As a concrete application of our approach, we show that the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers.

Copyright © 2006 Mihai Christodorescu. All rights reserved.
Maintained by Mihai Christodorescu (http://www.cs.wisc.edu/~mihai/).
Created: Thu Sep 14 17:02:29 2006
Last modified: Thu Jul 5 11:56:17 CDT 2007