(2.2.3) Contemporary DRAM architectures

Vinod Cuppu, Bruce Jacob, Brian Davis, and Trevor Mudge, A performance comparison of contemporary DRAM architectures, ISCA 1999. IEEE Xplore link

Summary paper of commercial architectures.

Observations

One time tradeoff between cost, bandwidth and latency

Latency min > ganging together multiple DRAMs into a wide structure

(Page mode, interleaving etc.,)

Widening buses will present new optimization opportunities

(Locality)

Buses wide as L2 cache yield best mem latency, but they cannot halve the latency of a bus half as wide.

N/2 is a good design choice

Critical-word first does not work well with burst mode

Choice of refresh mechanism can alter ave mem latency.

Conventional :

Muxed row address, column address, strobes, data out

FPM Fast page mode DRAM

row address, column, dataout, column, dataout, column, dataout

Extended Data Out DRAM

dataout happens along with next column address (understand!)

Sync DRAM

sync

Extended Sync

Rambus DRAM :

Banked 4, > 4 rows remain open with each row address.

Direct Rambus

1.6Gbytes/s. 17 half-row buffers > 16 full row buffers (Area consideration).

Conclusions

> address bandwidth problem but not latency problem

> bus speed >> latency

> Locality exists even to DRAM accesses

> use this for future speedups

presently > page mode and internal interleaving to achieve one time performance boost.