NETWORK LATENCY ISSUES
CS838, October 24, 2012

INTRO
- History of performance improvement (It’s Time for Low Latency. HotOS '11)
    * How much improvement has there been between 1983 and 2011 for each of
      the items, given knowledge of the values today?

                1983        2011        Improved
CPU speed       1x10Mhz     4x3Ghz      > 1,000x
Memory size     <= 2MB      8GB         >= 4,000x
Disk capacity   <= 30MB     2TB         > 60,000x
Network BW      3Mbps       10Gbps      > 3,000x
RTT             2.54ms      80us        32x

- Networking research has focused largely on improving bandwidth
* Why do we care so much about latency?
    - Serving content to humans quickly (HULL argues latency doesn't matter
      when a human is involved; do you buy this?)

SOURCES OF LATENCY
- Example scenario:
    - Front-end web servers running as VMs
    - Back-end key/value store parts of web pages
    - Requests from clients come over the Internet

* Take 5 minutes and list every source of latency you can think of.  Do   
  not just think about network-related latency.

- Application
    - Time to construct page (depends on CPU speed)
    - Time to perform key lookup (depends on memory speed, and disk speed)
- Memory
    - read from memory to perform index lookup (20ns per read)
- Disk 
    - Read actual data block from disk or flash
    - Disk seek time
    - Queueing delaying waiting for other reads/writes to complete
- Operating system (15us)
    - Waiting for process to be scheduled 
    - Interrupt handling for reading from disk and NIC
    - Packet de-multiplexing 
    - Generate network protocol control messages
    - Perform route lookup and append IP header
    - Perform ARP lookup and append Ethernet header
    - Firewall
- Protocol
    - ARP requests
    - TCP handshake time
    - TCP slow-start
    - TCP congestion-control -- AIMD
    - TCP retransmissions -- waiting for duplicate ACKS or timeout
    - TCP delayed ACKs
- Hypervisor
    - Waiting for VM to be scheduled -- issue for Amazon EC2 small instances
      (IEEE INFOCOM 2010)
    - Packet de-multiplexing
    - Copy from NIC into VM accessible memory
- NIC  (2.5-32us)
    - Checksumming, TCP offload (e.g., segment reassembly)
    - Queueing
    - Contending for access to the medium
    - Transmission time
- Data center network
    - Propagation latency (5ns/m)
    - Middleboxes
    - Switches (10-30us)
        - Queueing
        - Sending across switch fabric 
    - Congestion -> loss
- Wide-area network 
    - Propagation latency
    - Sub-optimal routing
    - Congestion -> loss

- Timing breakdown for network aspects (It's Time for Low Latency. HotOS '11)
Component               Delay       Round-Trip
Network Switch          10-30μs     100-300μs
Network Interface Card  2.5-32μs    10-128μs
OS Network Stack        15μs        60μs
Speed of Light (Fiber)  5ns/m       0.6-1.2μs

REDUCING LATENCY
* What research efforts have reduced network latency?
    - Transition from copper to fiber
    - Hardware improvements in switching fabrics
    - Algorithm advances in switching
    - Alternative queueing models in switches (QoS)
    - More responsive transport protocols (DCTCP)
    - Transport protocols with less control overhead (SPDY)
    - Push computationally intensive actions to NICs (e.g., TCP checksums)
    - Direct NIC access from VMs (vPF_RING)
    - Wide-area caching

HULL
* What does HULL do?
    - Reduces queueing in the network by pro-actively avoiding congestion
* Why do we want to reduce queueing?
    - Eliminates latency at switches waiting for transmission
        - Makes latency at switches consistent
* Why do we need to signal congestion before hand?
    - Normal sign of congestion is queue build-up, but we don't want queueing
    - If queue overflows, then we need to drop packets
        - Takes time for TCP to detect loss
        - Takes time to retransmit packets
* What mechanisms does HULL use?
    - Phantom queues (PQ)
        - Virtual keep track of queue build-up by tracking packet rate
        - One PQ per switch egress port
        - Set explicit congestion notification marks (ECN) in packets based
          on link utilization 
        * Need to leave headroom -- Why?
            - Need to avoid queueing
            - Cannot avoid queueing entirely unless packets are perfectly
              spaced
            - Since packets aren't perfectly space, we leave some headroom to
              allow for some non-perfect spacing and still avoid queueing
    - DCTCP + ECN
        - TCP normally cuts window in half when congestion is signaled
            - Causes lower average bandwidth and lots of fluctuations
        - DCTCP calculates fraction of packets with ECN marks and reduces
          window size proportionally
            - Minimizes bandwidth fluctuation
    - Packet pacing
        - Eliminates bursts that cause unwarranted response to congestion
        - Pass through pacer to leave the NIC
        - Enforces some average packet rate over small time intervals
        - Only pace long flows, which are bandwidth sensitive as opposed to
          latency sensitive

OTHER LATENCY REDUCTION CASES
- Disk latency vs. network latency
    * What is latency of read/write on regular disk?
    - Already said latency of network is at least about 150us round-trip,
      average in data center is more like a few ms
    - With regular disks, it is cheaper to do network IO than disk IO
    * What is latency of read/write on flash?
    - With flash disks, it is cheaper to do flash IO than network IO
    * What does this mean for applications right now (assuming we don't do
      anything about network latency)?  Think about HDFS.
        - Data locality is key
    * What is latency of read/write on memory?
    - Always been faster to do read/write from memory than from network
        - Leads to systems that use massive amounts of memory, e.g. memcached
- Remote Direct Memory Access (RDMA)
    - Memory IO is is very low latency
    - Limit to how much memory we can put in one machine -> need to access
      memory on other machines
    - Waiting for OS on other side to handle request adds latency -> want to
      avoid involvement of OS
    - Request for chunk of memory is handled directly by NIC
        - NIC on both sides has DMA
        - Requests are handled in NIC hardware
- WAN tricks
    - Access geographically closest service
        - Migrate services between data centers as demand shifts
        - Move VMs around the world over the course of a day 
        - Especially important for latency-sensitive apps like game servers
    - Access data from geographically close cache (CDNs)
    - Metric-based overlay routing
        - Focus of Internet routing is reachability, not performance
        - Overlay network can offer better latency
        - Need to be cognizant of the scale at which latency changes