CS 838: Big Data Systems |
Fall '16: Course Home Page
Assignments and projects are to be done in groups of 2 students.
There will be three assignments and one course project.
Assignment 1 is mandatory for each group; after that students have the option of either doing assignments 2/3 or
the course project. Students who wish to do both assignments and project will be given extra credits.
The assignments are well-defined and focus on providing
hands-on experience with various big data systems, where as the projects tend
to be more open-ended and are suited for those who wish to pursue research
in this field.
You would be required to use resources in the cloud for all experiments. In the
past, we have used CloudLab. For the current
offering, the plan is to use Azure. Azure offers a $200
free-trial credit for all those who sign up for an account. Students are
encouraged to use these credits for obtaining resources required for
assignments/projects. We will provide scripts that will help with setting up
Each group is expected to turn in a PDF writeup answering the
questions posed in the assignment. The tentative list of assignments is as
- Assignment 0 - Cluster setup on Azure :
In this assignment, you will setup a cluster using the Azure CLI.
This cluster will be used to carry out the remaining assignments
during the course of the semester. Released Sep 9.
- Assignment 1 - Fun with MR and Tez :
In this assignment, you will run canonical benchmark workloads using
Apache Hive atop Map-Reduce and Tez. You will study the differences in
the two systems, and understand how to tune them to obtain good
performance. You will also learn how to write simple MapReduce applications. Released Sep 9. Due
Sep 30 Oct 3.
- Assignment 2 - Even More Fun with Spark, Structured Streaming and Storm :
In this assignment, you will learn how to write native Spark applications and also understand what factors play a role in it's performance. You will also
learn how to write end-to-end streaming applications using Structured Streaming. Lastly, you will develop and run a real-time tweet processing streaming application on Apache Storm.
Released Oct 7. Due Nov 4.
- Assignment 3 - Supreme Fun with TensorFlow and GraphX :
In this assignment, will learn how to write native applications on TensorFlow and GraphX.
Released Nov 13. Due Dec 2.
A list of course projects to consider will be released early into the semester.
Each project in the list will contain a short description of the problem, the
questions that you will most likely answer and a sketch of initial steps to
take. Do not panic if you are not able to comprehend the description and goals upon first reading.
The related work will be covered in class as the semester progresses. The purpose of releasing the list early
is to give sufficient time for background reading.
The schedule for the project deliverables is as follows:
Related Work Report