Yan Zhai

Home
Skills and Experiences
Projects

Computer Scientist / System Engineer

I am a current Ph.D student of computer science in University of Wisconsin Madison, advised by Professor Michael Swift. I am about to graduate in December of 2018, and currently looking for software engineering jobs. My interests and expertises are in security and performance of computer systems and networking, especially of the IaaS and PaaS cloud, which I spent most of my time on during the past five years. You can find more details about my skills in my resume and projects. Contact me if I am the person you are looking to hire!

Contact

Email: yanzhai at cs.wisc.edu
Phone: 6o8-32o-3298
Github
Publications
LinkedIn
Facebook

About Me

I am a Ph.D of computer systems. I hack for security and performance. I have a wonderful life and a beautiful wife. I cook for two fluffy kids.

Work Experience

Google inc, Mountain View.
2013.5-2013.8, 2014.5-2014.8
I worked as an engineering intern on CPU power modeling and prediction in infrastructure team. My mentor is Xiao Zhang, and I also received great help from two professors Jason Mars and Lingjia Tang. Outcome of the internship was a model which addressed the CPU power prediction for hyperthreading enabled server, and it was published in Usenix ATC.

Education

University of Wisconsin Madison
2012-2018
I worked for Ph.D of computer science in cloud system security here. My advisor is Professor Michael Swift. My Ph.D dissertation designs and implements a new cloud authorization framework, which securely incorporates program identities and related security properties into authorization process, and enables new usage pattern like cross-tenant sharing on the cloud. The research contains several sub-projects: CQSTR, TapCon, and Latte. These projects addressed authorization issues at different layers of cloud software stack. More details can be found in my resume and projects.

Professor Jeffrey Chase and Thomas Ristenpart also actively supervised my research. They contributed indispensable insights about my work. Besides, I really enjoyed working together with a few talented researchers here, including but not limited to Qiang Cao, Adam Everspaugh, Robert Jellinek, Liang Wang, and Lichao Yin. They either directly helped in design and implementation, or inspired me with brilliant ideas.
Tsinghua University, Beijing.
2009-2012
I received my Master degree of computer science in high performance computing (HPC) here. I was advised by Professor Wenguang Chen, and I was a main participant in evaluations of Chinese top-3 super computers with a broad range of HPC applications. I also lead an Intel-sponsored project to evaluate public cloud like Amazon Web Service for HPC applications. During the three years, I collaborated with several amazing people on different projects, including Professor Xiaosong Ma, Professor Wei Xue, Professor Jidong Zhai, and an outstanding engineer Mingliang Liu who works for Salesforce at the moment. I learnt a lot from them.
Beihang University, Beijing.
2005 -2009
I received my bachelor degree of computer science here. The greatest accomplishment I achieved here was falling in love with my wife.

Resume

Download

Skills

Expertise

System Security

Performance Analysis

Operating Systems

Distributed Systems

Networking

Programming

C/C++

Bash

Python

Golang

Java

Systems

Linux

Amazon Web Service

Openstack

Docker

Kubernetes

Soft Skills

Open Minded

Collaboration

Integrity

Easy Going

Ph.D Research

Security of a system depends ultimately on its program identity, i.e. what codes run and how they are configured. However, accessing trustworthy program identities is fundamentally hard before. Recent progress in DevOps and well grounded trust in public platforms, such like Openstack and Kubernetes, have changed the ground for the cloud environment. My research seeks to incorporate program identities into cloud authorization systems with the help of IaaS cloud provider, and allows them to authorize access based on both requester's code identity and implied security properties, e.g. the firewall policy enforced on it.

CQSTR

CQSTR constructs a missing abstraction of current IaaS cloud, which we call Cloud Containers. A cloud container defines isolation policy of network and IaaS services. IaaS provider makes such policy immutable after creation and enforces it during the life of a cloud container. Further, IaaS exports the policy through a trusted database for third parties to prove the containment of a cloud container, i.e. secret data flowing into the cloud container can not be leaked in unwanted way. We implemented CQSTR on Openstack Kilo, and built a proof-of-concept version for Amazon Web Service by auditing the Cloud Trail logs. The performance overhead is negligible.

CQSTR Presentation on SoCC2016 (Keynote format) and Paper

TapCon

TapCon is an IaaS managed container service, which attests containers to their code identities. The key feature of TapCon is source based attestation, i.e. linking the running container to the authenticated source repository and tools that builds a container. TapCon leverages the unspoofable nature of cloud network, and allows users of TapCon to grant access based on a requesting container's unique network address. Authorization policy is expressed in first order logic, which is extensible and flexible. We implemented TapCon in Docker and Linux kernel, and currently we are integrating it to Kubernetes. There is literally no overhead for using TapCon.

TapCon Paper and Presentation on HotCloud2017

Latte

Latte is the authorization framework we described in research statement. It provides a rich basis for authorization. It can authorize operations based on requester’s code identity, which includes source code, build environment and runtime configuration, as well as third-party endorsements of trustworthiness. Latte supports the layered environments common in cloud computing, such as Docker containers running within virtual machines, and distributed services such as the Spark data-analytics platform. We integrated Latte with OpenStack, Docker and Spark to demonstrate how Latte can be used to improve security and enable new usage scenarios, such as allowing untrusted parties to compute over private data. Adopting Latte requires few changes to application platforms. The overhead of Latte in most cases is zero. Both CQSTR and TapCon are exemplary platforms used in Latte.

Latte paper is under submission to SoCC2018. Here is a draft version.

About the Code

The code is available by email request. I am still organizing all the code repositories for above three projects. There are more than fifteen of Git repositories to clean up, so it takes some time.

Other Interesting Projects

Mini Cloud (2013)

Cloud computing promises rapid adaptation to changes in workload by spinning up more virtualmachine instances. However, the ability to respond quickly depends on the time it takes the cloud provider to provision a new virtual machine and the time it takes the guest operating system to boot. We find that typical Linux instances used in Amazon’s EC2 cloud can take more than 50 seconds to boot and provide application services.
We describe a new technique that greatly reduces the latency to launch a new instance without requiring OS modifications. In a measurement study, we find that I/O delay to transfer OS code and data from storage is the dominant factor in boot time. Our proposed solution leverages existing Linux ramdisk support to optimize I/O and effectively prefetch the entire OS and application data in one operation. In an evaluation in EC2, we find our approach reduces boot latency by more than 80%.

Tech Report
Scripts and Code

HPC in Cloud (2011)

The emergence of cloud services brings new possibilities for constructing and using HPC platforms. However, while cloud services provide the flexibility and convenience of customized, pay-as-you-go parallel computing, multiple previous studies in the past three years have indicated that cloud-based clusters need a significant performance boost to become a competitive choice, especially for tightly coupled parallel applications.
In this work, we examine the feasibility of running HPC applications in clouds. This study distinguishes itself from existing investigations in several ways: 1) We carry out a comprehensive examination of issues relevant to the HPC community, including performance, cost, user experience, and range of user activities. 2) We compare an Amazon EC2-based platform built upon its newly available HPC-oriented virtual machines with typical local cluster and supercomputer options, using benchmarks and applications with scale and problem size unprecedented in previous cloud HPC studies. 3) We perform detailed performance and scalability analysis to locate the chief limiting factors of the state-of-the-art cloud based clusters. 4) We present a case study on the impact of per-application parallel I/O system configuration uniquely enabled by cloud services. Our results reveal that though the scalability of EC2-based virtual clusters still lags behind traditional HPC alternatives, they are rapidly gaining in overall performance and cost-effectiveness, making them feasible candidates for performing tightly coupled scientific computing. In addition, our detailed benchmarking and profiling discloses and analyzes several problems regarding the performance and performance stability on EC2.

Paper on SuperComputing 2011

TCP FastOpen (2011)

Latency is key to network applications. For this project we modified TCP handshake protocol to carry data in the SYN+ACK and ACK packet, so that for short connections, this can significantly improve the latency. This is a collaborating project. I implemented most of the kernel modification and the user space daemons.

Code

Multipath Tor (2010)

Anonymous communication like Tor can suffer from traffic analysis. This can be countered by adding more noises and communication path. This is a collaborating project. I wrote most of the code.