There will be three assignments and one half semester long course project.
Assignments: Assignments are to be done in groups of 3
students. All assignments will use CloudLab. Each group is expected to turn in a PDF writeup answering the
questions posed in the assignment.
- Assignment 1 - Fun with MR and Tez:
In this assignment, you will run canonical benchmark workloads using
Apache Hive atop Map-Reduce and Tez. You will study the differences in
the two systems, and understand how to tune them to obtain good
performance. Released Sep 8. Due Sep 27.
- Assignment 2 - Even More Fun with
Spark:
In this assignment, you will run the same workload and
analysis as above, but atop SparkSQL and Apache Spark. You will also
learn how to write native spark queries and play with persisting
RDDs. Released Oct 2. Due Oct 12.
- Assignment 3 - Supreme fun with Storm, GraphX and MLlib:
In this assignment, you
will develop and run streaming, machine learning and graph processing applications (e.g., Apache Storm, Mlib and GraphX on Apache Spark). Released Oct 16. Due Nov 6.
Projects: Will be released in mid-October, and will span 8 weeks (i.e, through end of class).