All Research

Shipwright: A Human-in-the-Loop System for Dockerfile Repair

Abstract

Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and—to our great surprise—found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this problem, we propose Shipwright, a human-in-the-loop system for finding repairs in broken Dockerfiles. Shipwright uses a modified version of the BERT language model to embed build logs and to cluster broken Dockerfiles. Using these clusters and a search-based procedure, we were able to design 13 rules for making automated repairs in Dockerfiles. With the aid of Shipwright, we submitted 45 pull requests (with a 42.2% acceptance rate) to GitHub projects with broken Dockerfiles. Furthermore, in a “time-travel” analysis of broken Dockerfiles that were later fixed, we found that Shipwright proposed repairs that were equivalent to human-authored patches in 22.77% of the cases we studied. Finally, we compared our work with recent, state-of-the-art, static Dockerfile analyses, and found that, while static tools detected possible build-failure-inducing issues in 20.6–33.8% of the files we examined, Shipwright was able to detect possible issues in 73.5% of the files and, additionally, provide automated repairs for 19.2% of the files.

* Denotes equal contributions.