Public Datasets
A Dataset of Dockerfiles
Find the artifact on GitHub or Zenodo. This repository encompasses both an artifact (evaluated Reusable
and Available
during the ICES 2020 artifact evaluation) and the complete dataset of Dockerfiles extracted from GitHub. See the following resources for details:
If you just want the source-level Dockerfiles (de-duplicated) you can find them here.
If you just want the Dockerfiles (stored as Abstract Syntax Trees) you can find them here.
For details on the dataset, and our applications of the data plus the tools present in the artifact, see the following resources:
- A Dataset of Dockerfiles (MSR’2020 Workshop Paper).
- Learning from, Understanding, and Supporting DevsOps Artifacts for Docker (ICSE’2020 Conference Paper).
Code Vectors: Understanding Programs Through Embedded Abstraced Symbolic Traces
Find the artifact on GitHub or Zenodo. Check out our pre-built Docker images on Docker Hub. Our lightweight symbolic execution engine lsee
and source-to-source transformation engine c2ocaml
are both available on GitHub:
If you just want the raw data (and have Docker installed) you can run:
docker pull jjhenkel/code-vectors-artifact:dataset
See here for raw dataset details.