Shipwright: A Human-in-the-Loop System for Dockerfile Repair
* Denotes equal contributions. Shipwright is a human-in-the-loop system for automated repair of broken Dockerfiles. We were able to use Shipwright to submit 45 pull requests with a 42.2% acceptance rate.
Learning from, Understanding, and Supporting DevOps Artifacts for Docker
A toolset for advanced Dockerfile parsing, Dockerfile rule mining, and rule-based static analysis of Dockerfiles. This paper seeks to improve the support for Docker (and, more generally, DevOps artifacts) by taking the first steps toward automated rule mining in this domain.
pdfacmarXivdatasetsvideoA Dataset of Dockerfiles
Expanded details on the dataset of 178,000 unique Dockerfiles used in "Learning from, Understanding, and Supporting DevOps Artifacts for Docker." In addition to a discussion of the dataset, this workshop paper features example usages and schemas for all of the unique representations we took the data through to achieve rule mining and static checking.
pdfacmarXivdatasetsvideoSemantic Robustness of Models of Source Code
* Denotes equal contributions. A framework for testing the robustness of models of source code against sequences semantics-preserving transformations. Additionally, tools and results related to training robust models of code (using adversarial training) and gathering insights into what trained models have learned (via attribution).
pdfarXivEnabling Open-World Specification Mining via Unsupervised Learning
A framework for mining specifications and usage patterns without the aid of rule templates, user-directed feedback, or predefined API surfaces. This paper leverages both learned embeddings and traditional co-occurrence statistics to disentangle trace-based data.
pdfarXivCode Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces
A novel technique for embedding semantically rich program artifacts. This paper explores how to combine sophisticated program analysis (in the form of lightweight symbolic execution) with off-the-shelf machine learning.
pdfacmarXivdatasets