The paper focuses on the consistent trend of imbalance in the improvement of latency to the improvement of bandwidth(b/w), across different technology areas. The author explained some plausible reasons behind the imbalance and described a few commonly used techniques used to cope with this.

Across technologies such as processor, memory, network or disk, it is observed that b/w has improved much faster than latency. More specifically, across these technologies; in the time taken by b/w to double, latency has only improved by around 30%. This trend holds true across technological milestones of varying time intervals, in all these areas. We know b/w can be expressed as product between frequency(inverse of latency) and amount of bits/data accessed/transferred in parallel. Now, with Moore's law holding good, transistors are becoming smaller and faster. Faster transistor helps improving both latency and b/w. Chips are having more of transistors as well, which helps b/w, as parallelism can be leveraged from larger number of transistors, but it hurts latency as it requires coordinating among large number of transistors. Moreover as wire delay does not scale well, larger chips make it longer to travel across chip, thus hurting latency. From the definition of b/w its clear that improvement(decrease) of latency directly helps improvement(increase) of b/w, but not vice versa. Improvement in parallelism in accessing data helps b/w, but not latency. Distance sets a lower limit on latency as we can travel no faster than light, whereas there is no such bound on the amount of parallelism that can be exploited to help b/w. Another reason may be the commercial motivation to improve b/w more, as probably its easier to convince customer with higher b/w than with lower latency. Sometimes improvement in b/w comes at the cost of latency as techniques helping b/w,e.g buffering or adding more modules to exploit parallelism, may hurt latency. S/w overhead may hurt latency more as its effect on b/w may be more easily amortized over larger messages. Author notices that there are techniques which try to cope with this. Among them caching is a widely used; where locality of reference is exploited to service many requests with smaller latency. Duplicating data also helps hiding some latency. Prediction is another way to overlap or hide latency. But author thought that this disparity may turn more ugly, as all of these techniques have already reached its' pinnacle. Author suggested that while designing a system, it will be wiser to pick a design that better leverages improvement in b/w than one which relies more on latency.

The paper makes a nice observation about the disparity between b/w and latency and tries explain few reasons behind it. The author gives some valuable suggestions on coping with this in designing a new system. But the question that remained unanswered was why should we worry about this disparity in b/w and latency? It was not clear, why the author thought that finding a new trick to hide latency, will be tougher now. The author opined that b/w is redundant; but with advent of multicores chips, does it hold true in regard to memory b/w requirement?