Static-Analysis for Stream Balance
Michael Davies and Hailey Fields
Stream-Dataflow is a recently proposed architecture which incorporates
a classic processor core with an explicit dataflow component. This
architecture provides stream commands as instructions in the
Instruction Set Architecture (ISA). Programs for Stream-Dataflow must
ensure that commands balance the streams on all vector ports. Failure
to do this results in a hardware deadlock, which requires a device
reset in order to recover.
In this project, we explore the use of static program analysis
techniques to help programmers catch this class of stream-imbalance
bugs. We employ dataflow analysis and develop a dataflow domain over
which we can perform analysis. In addition, we explore how to augment
our engine with an SMT solver to enable formal stream balance
verification, and potentially expose cases where imbalance can occur.
We evaluate our engine by designing tests to stress scenarios that are
known to be problematic. In addition, we demonstrate the real-world
applications for this project by testing industry code.
Invariant Inference for Probabilistic Programs
Jialu Bao
Weakest preexpectation calculus was introduced for reasoning the
expectation of program states and could be seen as a probabilistic
counterpart of Dijkstra's weakest precondition calculus that reasons
realized program states. In this project, we work on inferring loop
invariants in Weakest preexpectation calculus using data-driven
techniques. Loop invariants are the hardest part to infer as their
rule involves calculating a least fixed point, and efficient inference
of them would enable automatic reasoning on many interesting
probabilistic programs. Our current method first formulates candidate
symbolic expressions through gradient descent and then verifies them
through symbolic calculus. Our experiments show that it could often
find candidates of the same shape as the actual invariants.
A Data Flow Domain for Finding Use-After-Free Bugs
Tianhao Chai
In the C and the C++ programming languages, programs are expected to
perform heap memory management by themselves via the common library
calls `malloc` and `free`. The typical use-after-free (UAF) problem
arises when the memory region associated with a pointer has been freed
but is subsequently used or addressed. Such bugs open up a range of
opportunities for security exploits in the form of memory corruption
and arbitrary code execution. Albeit its impact, heap memory
management in a large software project is often challenging, where
tracking heap liveness for an increasing number of variables becomes
increasingly difficult by hand. While existing software systems,
e.g., valgrind, can perform heap-memory tracking at runtime, the
profiling approach comes with a moderate runtime overhead and limited
code coverage. This work proposes a static analysis approach to UAF
bugs: we construct a data flow domain on the language of LLVM
intermediate representation (IR) to track variable liveness on the
heap, and apply it to IR that is generated from C source codes.
Bounded Symbolic Model Checking in the Union-Bound Logic
Zach Susag
The union bound logic is a probabilistic Hoare logic based on the
union bound, a basic result from probability theory. While this
principle is simple, it is an extremely common tool in the analysis of
randomized algorithms. From a formal verification perspective, the
union bound allows compositional reasoning over possible ways an
algorithm may go wrong. To automate this verification, this work
proposes a symbolic model checking algorithm which uses KLEE to
generate path conditions and Z3 to verify whether or not a judgement
in traditional Hoare logic is valid. Using this approach, we are able
to automatically prove accuracy properties for non-interactive and
interactive programs written in a typical probabilistic, imperative
programming language. Notably, this algorithm uses a purely symbolic
approach, rather than enumerative model-checking as found in PRISM,
another probabilistic model checker.
Solving Program Path Constraints using Automatic Differentiation
Deepak Sirone and Anjali
Solving path constraints in programs is crucial for generating inputs
which trigger a particular path in a program. Past work in symbolic
execution uses an SMT solver to solve path constraints. However, it is
very hard for SMT solvers to solve general constraints involving
non-linear functions of the inputs. Chen et. al. convert branch
conditions into arithmetic formulas in their Angora fuzzer, and use
gradient descent rather than a call to an SMT solver to try to flip a
branch. This approach improves the speed at which constraints are
solved in practise. In Angora, each path constraint is extracted from
the branch condition and a numerical approximation is used to compute
the partial derivative in a particular direction. However, in order to
compute the gradient of f(x), they use the mathematician's notation of
∂f(x)/∂xi=(f(x+∂)-f(x))/∂. The Angora paper suggests using
∂ = 1. However, in many cases, this leads to a bad approximation of the
partial derivative.
In this work, we implement automatic differentiation (or divided
differencing) as another way of finding the gradient. To achieve this,
we implement a parse tree for expressions involving integer and float
addition, subtraction and multiplication using the use-def chains
provided by LLVM. For each comparison, we output the partial
derivatives with respect to all the leaves of the expression tree to
shared memory which are then read by the gradient descent code. For
each leaf node of the comparison expression, we log the corresponding
taint labels. These labels are used to compute the offsets into the
input that affect the values of the leaf nodes. We measure the
performance of our implementation on the LAVA-M dataset with respect
to the inputs generated per second and also on the line and branch
coverage.
Automating Probabilistic Proofs in Union-Bound Hoare Logic
Jinwoo Kim
Methods for verifying probabilistic programs have recently gained much
interest. Unfortunately, most of these systems lack automation and
require that the reasoning be done by hand. Part of the reason for
this is that probabilistic programs require reasoning about
probabilities and distributions, in addition to the already difficult
task of program verification.
For this project, I present a basic automation scheme for union bound
Hoare logic, which is a lightweight logic for probabilistic
systems. The core component is parametrized axioms, which are
user-provided facts about distributions that may be instantiated to
complete the proof. I will explain how probabilistic proofs reduce to
SMT solving in this case, along with a prototype implementation of the
automation scheme.
Automatic Differentiation for Angora
Yuhao Zhang and Rui Huang
Angora Fuzzer tries to explore every possible branch in each
conditional statement by gradient descent. It calculates the partial
derivatives by numerical approximation, which requires two executions
of the target program and generates imprecise partial derivatives. To
mitigate the inefficiency and imprecision of numerical approximation,
we propose to use automatic differentiation (AD) to calculate the
partial derivatives. After using AD, Angora only needs to execute the
binary once to extract both the output and the precise derivatives at
the same time.
We will talk about how to integrate dynamic automatic differentiation
into Angora Fuzzer in three parts. First, we implement an Int class to
replace all integer variables. The Int class allows calculating the
derivatives alongside performing arithmetic operations. Second, we
inject proxy calls in llvm intermediate representation to retrieve
derivatives calculated in the Int class for Angora Fuzzer. Third, we
modify Angora Fuzzer to use derivatives calculated by AD if the
derivatives are available. Otherwise, Angora will still use numerical
approximation.
We will also present the results of AD-assisted Angora Fuzzer on some
test programs.
Whole-program paths for multi-threaded programs
Chen Chen and Xin Yu
We will present ideas for extending Larus's whole-program paths to
multi-threaded programs. Our approach involves three pieces: the
collection of thread-specific path traces, a new post-processing step
to sort and concatenate traces, and modifications to the compression
algorithm to handle multi-threaded path traces. We will also discuss
limitations in implementing whole-program paths in LLVM and
modifications needed to work around them.
Time: 11:00-11:18
Time: 11:18-11:36
Time: 11:36-11:54
Time: 11:54-12:12
Presentations on Friday, 5/1/2020, 11:00 AM - 12:15 PM
Webex link
for the class.
Time: 11:00-11:18
Time:11:18-11:36
Time: 11:36-11:54
Time: 11:54-12:12