CS 701: Course Project Presentations
Section 1
Spring 2020



Presentations on Wednesday, 4/29/2020, 11:00 AM - 12:15 PM


Webex link for the class.

Static-Analysis for Stream Balance

Michael Davies and Hailey Fields
Time: 11:00-11:18

Stream-Dataflow is a recently proposed architecture which incorporates a classic processor core with an explicit dataflow component. This architecture provides stream commands as instructions in the Instruction Set Architecture (ISA). Programs for Stream-Dataflow must ensure that commands balance the streams on all vector ports. Failure to do this results in a hardware deadlock, which requires a device reset in order to recover.

In this project, we explore the use of static program analysis techniques to help programmers catch this class of stream-imbalance bugs. We employ dataflow analysis and develop a dataflow domain over which we can perform analysis. In addition, we explore how to augment our engine with an SMT solver to enable formal stream balance verification, and potentially expose cases where imbalance can occur.

We evaluate our engine by designing tests to stress scenarios that are known to be problematic. In addition, we demonstrate the real-world applications for this project by testing industry code.


Invariant Inference for Probabilistic Programs

Jialu Bao
Time: 11:18-11:36

Weakest preexpectation calculus was introduced for reasoning the expectation of program states and could be seen as a probabilistic counterpart of Dijkstra's weakest precondition calculus that reasons realized program states. In this project, we work on inferring loop invariants in Weakest preexpectation calculus using data-driven techniques. Loop invariants are the hardest part to infer as their rule involves calculating a least fixed point, and efficient inference of them would enable automatic reasoning on many interesting probabilistic programs. Our current method first formulates candidate symbolic expressions through gradient descent and then verifies them through symbolic calculus. Our experiments show that it could often find candidates of the same shape as the actual invariants.


A Data Flow Domain for Finding Use-After-Free Bugs

Tianhao Chai
Time: 11:36-11:54

In the C and the C++ programming languages, programs are expected to perform heap memory management by themselves via the common library calls `malloc` and `free`. The typical use-after-free (UAF) problem arises when the memory region associated with a pointer has been freed but is subsequently used or addressed. Such bugs open up a range of opportunities for security exploits in the form of memory corruption and arbitrary code execution. Albeit its impact, heap memory management in a large software project is often challenging, where tracking heap liveness for an increasing number of variables becomes increasingly difficult by hand. While existing software systems, e.g., valgrind, can perform heap-memory tracking at runtime, the profiling approach comes with a moderate runtime overhead and limited code coverage. This work proposes a static analysis approach to UAF bugs: we construct a data flow domain on the language of LLVM intermediate representation (IR) to track variable liveness on the heap, and apply it to IR that is generated from C source codes.


Bounded Symbolic Model Checking in the Union-Bound Logic

Zach Susag
Time: 11:54-12:12

The union bound logic is a probabilistic Hoare logic based on the union bound, a basic result from probability theory. While this principle is simple, it is an extremely common tool in the analysis of randomized algorithms. From a formal verification perspective, the union bound allows compositional reasoning over possible ways an algorithm may go wrong. To automate this verification, this work proposes a symbolic model checking algorithm which uses KLEE to generate path conditions and Z3 to verify whether or not a judgement in traditional Hoare logic is valid. Using this approach, we are able to automatically prove accuracy properties for non-interactive and interactive programs written in a typical probabilistic, imperative programming language. Notably, this algorithm uses a purely symbolic approach, rather than enumerative model-checking as found in PRISM, another probabilistic model checker.




Presentations on Friday, 5/1/2020, 11:00 AM - 12:15 PM


Webex link for the class.

Solving Program Path Constraints using Automatic Differentiation

Deepak Sirone and Anjali
Time: 11:00-11:18

Solving path constraints in programs is crucial for generating inputs which trigger a particular path in a program. Past work in symbolic execution uses an SMT solver to solve path constraints. However, it is very hard for SMT solvers to solve general constraints involving non-linear functions of the inputs. Chen et. al. convert branch conditions into arithmetic formulas in their Angora fuzzer, and use gradient descent rather than a call to an SMT solver to try to flip a branch. This approach improves the speed at which constraints are solved in practise. In Angora, each path constraint is extracted from the branch condition and a numerical approximation is used to compute the partial derivative in a particular direction. However, in order to compute the gradient of f(x), they use the mathematician's notation of ∂f(x)/∂xi=(f(x+∂)-f(x))/∂. The Angora paper suggests using ∂ = 1. However, in many cases, this leads to a bad approximation of the partial derivative.

In this work, we implement automatic differentiation (or divided differencing) as another way of finding the gradient. To achieve this, we implement a parse tree for expressions involving integer and float addition, subtraction and multiplication using the use-def chains provided by LLVM. For each comparison, we output the partial derivatives with respect to all the leaves of the expression tree to shared memory which are then read by the gradient descent code. For each leaf node of the comparison expression, we log the corresponding taint labels. These labels are used to compute the offsets into the input that affect the values of the leaf nodes. We measure the performance of our implementation on the LAVA-M dataset with respect to the inputs generated per second and also on the line and branch coverage.


Automating Probabilistic Proofs in Union-Bound Hoare Logic

Jinwoo Kim
Time:11:18-11:36

Methods for verifying probabilistic programs have recently gained much interest. Unfortunately, most of these systems lack automation and require that the reasoning be done by hand. Part of the reason for this is that probabilistic programs require reasoning about probabilities and distributions, in addition to the already difficult task of program verification.

For this project, I present a basic automation scheme for union bound Hoare logic, which is a lightweight logic for probabilistic systems. The core component is parametrized axioms, which are user-provided facts about distributions that may be instantiated to complete the proof. I will explain how probabilistic proofs reduce to SMT solving in this case, along with a prototype implementation of the automation scheme.


Automatic Differentiation for Angora

Yuhao Zhang and Rui Huang
Time: 11:36-11:54

Angora Fuzzer tries to explore every possible branch in each conditional statement by gradient descent. It calculates the partial derivatives by numerical approximation, which requires two executions of the target program and generates imprecise partial derivatives. To mitigate the inefficiency and imprecision of numerical approximation, we propose to use automatic differentiation (AD) to calculate the partial derivatives. After using AD, Angora only needs to execute the binary once to extract both the output and the precise derivatives at the same time.

We will talk about how to integrate dynamic automatic differentiation into Angora Fuzzer in three parts. First, we implement an Int class to replace all integer variables. The Int class allows calculating the derivatives alongside performing arithmetic operations. Second, we inject proxy calls in llvm intermediate representation to retrieve derivatives calculated in the Int class for Angora Fuzzer. Third, we modify Angora Fuzzer to use derivatives calculated by AD if the derivatives are available. Otherwise, Angora will still use numerical approximation.

We will also present the results of AD-assisted Angora Fuzzer on some test programs.


Whole-program paths for multi-threaded programs

Chen Chen and Xin Yu
Time: 11:54-12:12

We will present ideas for extending Larus's whole-program paths to multi-threaded programs. Our approach involves three pieces: the collection of thread-specific path traces, a new post-processing step to sort and concatenate traces, and modifications to the compression algorithm to handle multi-threaded path traces. We will also discuss limitations in implementing whole-program paths in LLVM and modifications needed to work around them.