9/6 |
Introduction |
Shivaram: slides |
Fill out presentation preference form. |
9/7 |
|
|
Assignment 0 |
9/11 |
The Datacenter as a Computer, Chapter 1 and 2
VL2: A Scalable and Flexible Data Center Network (Optional)
|
Presentation Tips
Slides
|
|
|
Storage Systems |
|
|
9/13 |
The Google File System
Flat Datacenter Storage (Optional)
f4: Facebook’s Warm BLOB Storage System (Optional)
|
Shivaram
Arjun Balasubramanian
Aarati Kakaraparthy |
|
9/17 |
|
|
Assignment 1 out |
9/18 |
Bigtable: A Distributed Storage System for Structured Data
Dynamo: Amazon’s Highly Available Key-value Store (Optional)
Spanner: Google's Globally-Distributed Database (Optional)
|
Shivaram
Saurabh Agarwal
Adarsh Kumar |
|
|
Computation Frameworks |
|
|
9/20 |
MapReduce:Simplified Data Processing on Large Clusters
Dryad:Distributed Data-Parallel Programs from Sequential Building Blocks (Optional)
CIEL: a universal execution engine for distributed data-flow computing (Optional)
|
Shivaram
Yahn-Chung Chen
Roshan G Lal
|
|
9/25 |
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language (Optional)
Encapsulation of parallelism in the Volcano query processing system (Optional)
|
Shivaram Venkataraman
Derek Paulsen
Huawei Wang
|
|
|
Scheduling |
|
|
9/27 |
Borg: Large-scale cluster management at Google with Borg. See also Borg, Omega, and Kubernetes
YARN: Yet Another Resource Negotiator (Optional)
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center (Optional)
|
Shivaram Venkataraman
Manjunath Shettar
Jayashankar Tekkedatha
|
|
10/1 |
|
|
Assignment 1 due |
10/2 |
DRF: Dominant Resource Fairness
Tetris:Multi-Resource Packing for Cluster Schedulers (Optional)
Quincy: Fair Scheduling for Distributed Computing Clusters (Optional)
|
Shivaram Venkataraman
Sanchit Jain
Steve Wang
|
|
|
Machine Learning |
|
|
10/4 |
Towards a Unified Architecture for in-RDBMS Analytics
DimmWitted: A Study of Main-Memory Statistical Analytics (Optional)
KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics (Optional)
|
Shivaram (Bismarck)
Shivaram (DimmWitted)
Yudhister Satija
|
Assignment 2 out. Submit project topics, group |
10/9 |
Guest lecture on Scalable ML Algorithms |
|
|
10/11 |
Tensorflow: A system for large-scale machine learning
Ray: A Distributed Framework for Emerging AI Applications (Optional)
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
Also see document on programming style (Optional)
|
Shivaram
Sonu Agarwal
Mingren Shen
|
|
10/16 |
Scaling Distributed Machine Learning with the Parameter Server
STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning (Optional)
PipeDream: Fast and Efficient Pipeline Parallel DNN Training (Optional)
|
Shivaram
Srujith Poondla
Varun Batra
|
Assignment 2 due |
10/18 |
Clipper: A Low-Latency Online Prediction Serving System
DeepCPU: Serving RNN-based Deep Learning Models 10x Faster (Optional)
Pretzel: Opening the Black Box of Machine Learning Prediction Serving Systems (Optional)
|
Shivaram
Siddhant Garg
Qinyuan Sun |
|
|
SQL Frameworks |
|
|
10/23 |
Spark SQL: Relational Data Processing in Spark
Impala: A Modern, Open-Source SQL Engine for Hadoop (Optional)
Dremel: Interactive Analysis of Web-Scale Datasets (Optional)
|
Shivaram
Yogesh Chockalingam
Philip Martinkus
|
Project introduction due. |
10/25 |
Global analytics in the face of bandwidth and regulatory constraints
TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks (Optional)
CLARINET: WAN-Aware Optimization for Analytics Queries (Optional)
|
Shivaram
Abbinaya Kalyanaraman
Robert Claus |
|
10/30 |
Trill: A High-Performance Incremental Query Processor for Diverse Analytics
Rethinking SIMD Vectorization for In-Memory Databases (Optional)
Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited (Optional)
|
Shivaram
Sri Harshal Parimi
Yanghui Kang
|
|
|
Stream Processing |
|
|
11/1 |
Naiad: A Timely Dataflow System
Twitter Heron: Stream Processing at Scale (Optional)
Apache Flink™: Stream and Batch Processing in a Single Engine (Optional)
|
Shivaram
Zijun Ma
Akshaya Kalyanaraman
|
|
11/5 |
|
|
Midterm on 11/5 from 7.15pm to 9.15pm. Venue 1221 CS |
11/6 |
Discretized Streams: Fault-Tolerant Streaming Computation at Scale
Drizzle: Fast and Adaptable Stream Processing at Scale (Optional)
Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems (Optional)
|
Shivaram
Kaushik Chandrasekhar
Samhith Venkatesh
|
|
11/8 |
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
Realtime Data Processing at Facebook (Optional)
Aurora: a new model and architecture for data stream management (Optional)
|
Shivaram
Abhay Venkatesh
Rahul Jayan
|
|
|
Graph Processing |
|
|
11/13 |
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
GraphX: Graph Processing in a Distributed Dataflow Framework (Optional)
Scalability! But at what COST? (Optional)
|
Bidyut Hota
Abhinav Garg
|
|
11/15 |
Arabesque: A System for Distributed Graph Mining
Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration (Optional)
ASAP: Fast, Approximate Pattern Mining at Scale (Optional)
|
Shivaram
Shuoxuan Dong
Yunang Chen |
|
|
Monitoring, Debugging |
|
|
11/20 |
Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems
Making Sense of Performance in Data Analytics Frameworks (Optional)
COZ: Finding Code that Counts with Causal Profiling (Optional)
|
Shivaram
Zi Wang
Anuja Golechha
|
|
11/22 |
Happy Thanksgiving! |
|
|
|
New Hardware Models |
|
|
11/27 |
Occupy the Cloud: Distributed Computing for the 99%
Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (Optional)
Serverless Computation with OpenLambda (Optional)
|
Shivaram
Wen-Fu Lee
Chirayu Garg
|
|
11/29 |
FaRM: Fast Remote Memory
No compromises: distributed transactions with consistency, availability, and performance (Optional)
FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs (Optional)
|
Shivaram
Xiuyuan He
Shivaram
|
|
12/4 |
In-Datacenter Performance Analysis of a Tensor Processing Unit
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services (Optional)
Strata: A Cross Media File System (Optional)
|
Shivaram
Venkatesh Somyajulu
Derek Hancock
|
|
12/6 |
"One Size Fits All": An Idea Whose Time Has Come and Gone
|
Shivaram
|
|
12/13 |
|
|
Poster session 3.30pm-5pm |
12/17 |
|
|
Final project reports due |