[Architecture:
Compute+Overall (B+09)] The Datacenter as a Computer: An
Introduction to the Design of Warehouse-Scale Machines
, L.A. Barroso, U. Holzle, Synthesis Lectures on Computer Architecture, 2009. Chapter 1 and 2.
[Architecture:
Networks: Optional (G+09)] VL2: A Scalable and Flexible Data Center
Network, Greenberg et al., SIGCOMM 2009.
[Architecture:
Storage (S+10)] The Hadoop Distributed File System, Schvachko et al, MSST, 2010.
[Streaming: Heron Optional (KB+15)] Twitter Heron: Stream Processing at Scale, Kulkarni et al, SIGMOD, 2015.
[Streaming: SparkStreaming (ZD+13)] Discretized Streams: Fault-Tolerant Streaming Computation at Scale, Zaharia et al, SOSP, 2013. Also read this introduction to Structured Streaming.
[QMS: Kafka (K+11)] Kafka:a Distributed Messaging System for Log Processing, Kreps et al, NetDB Workshop, 2011. Also read this comparison of widely used Queuing Messaging Processing Systems.
[Streaming: rStreams (L+16)] StreamScope: Continuous Reliable Distributed Processing of Big Data Streams, Lin et al, NSDI, 2016.
Applications: Graph Processing
[GraphProc:Pregel (M+10)] Pregel: A System for Large-Scale Graph Processing, Malewicz et al, SIGMOD, 2010.
[GraphProc: PowerGraph: Optional (GL+12)] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs, Gonzalez et al, OSDI, 2012.
[Graph Storage: TAO Optional (B+13)]TAO: Facebook's Distributed Data Store for the Social Graph, Bronson et al, USENIX ATC, 2013.
[GraphProc: GraphX (G+14)] GraphX: Graph Processing in a Distributed Dataflow Framework, Gonzalez et al, OSDI, 2014.