(2.2.3) Contemporary DRAM architectures

Vinod Cuppu, Bruce Jacob, Brian Davis, and Trevor Mudge, A performance comparison of contemporary DRAM architectures, ISCA 1999. IEEE Xplore link



Summary paper of commercial architectures. 

Observations
     One time tradeoff between cost, bandwidth and latency
          Latency min > ganging together multiple DRAMs into a wide structure
          (Page mode, interleaving etc.,)
     Widening buses will present new optimization opportunities
          (Locality)
     Buses wide as L2 cache yield best mem latency, but they cannot halve the latency of a bus half as wide. 
          N/2 is a good design choice
     Critical-word first does not work well with burst mode
     Choice of refresh mechanism can alter ave mem latency.

Conventional : 
     Muxed row address, column address, strobes, data out
FPM Fast page mode DRAM
     row address, column, dataout, column, dataout, column, dataout
Extended Data Out DRAM
     dataout happens along with next column address (understand!)
Sync DRAM
     sync
Extended Sync
Rambus DRAM : 
     Banked 4, > 4 rows remain open with each row address. 
Direct Rambus
     1.6Gbytes/s. 17 half-row buffers > 16 full row buffers (Area consideration). 

Conclusions
> address bandwidth problem but not latency problem
> bus speed >> latency
> Locality exists even to DRAM accesses
> use this for future speedups
presently > page mode and internal interleaving to achieve one time performance boost.