Public Datasets

A Dataset of Dockerfiles

DOI

Find the artifact on GitHub or Zenodo. This repository encompasses both an artifact (evaluated Reusable and Available during the ICES 2020 artifact evaluation) and the complete dataset of Dockerfiles extracted from GitHub. See the following resources for details:

If you just want the source-level Dockerfiles (de-duplicated) you can find them here.

If you just want the Dockerfiles (stored as Abstract Syntax Trees) you can find them here.

For details on the dataset, and our applications of the data plus the tools present in the artifact, see the following resources:


Code Vectors: Understanding Programs Through Embedded Abstraced Symbolic Traces

DOI

Find the artifact on GitHub or Zenodo. Check out our pre-built Docker images on Docker Hub. Our lightweight symbolic execution engine lsee and source-to-source transformation engine c2ocaml are both available on GitHub:

If you just want the raw data (and have Docker installed) you can run:

docker pull jjhenkel/code-vectors-artifact:dataset

See here for raw dataset details.