Raw Data for Coherent Network Interfaces (ISCA96)


Coherent Network Interfaces for Fine-Grain Communication. Shubhendu S. Mukherjee, Babak Falsafi, Mark D. Hill, and David A. Wood. To appear in the Proceedings of the 23rd International Symposium on Computer Architecture (ISCA), May 1996.
Raw data for Figure 6
Raw data for Figure 7
Raw data for Figure 8

Raw data for Figure 6. The following three tables show the process-to-process round-trip message latency (in microseconds) for different messages sizes (in bytes).
Network
Interface
Cache Bus
8 16 32 64 128 256 512 1024 2048 4096
NI2w 3.77 3.95 4.32 4.83 5.95 9.38 12.82 19.58 33.19 61.58
Network
Interface
Coherent Memory Bus
8 16 32 64 128 256 512 1024 2048 4096
NI2w 6.46 7.32 7.91 10.13 14.59 24.68 39.00 67.10 123.74 239.60
CNI4 6.24 6.50 6.77 8.07 9.96 14.49 20.87 35.34 61.00 113.79
CNI16Q 5.49 5.72 5.83 7.11 8.50 11.86 16.38 24.90 42.37 79.79
CNI512Q 5.35 5.36 5.68 6.93 8.47 11.68 16.10 24.62 42.08 79.86
CNI16Qm 5.38 5.49 5.77 7.41 9.44 13.43 19.46 30.22 52.96 97.86
Network
Interface
Coherent I/O Bus
8 16 32 64 128 256 512 1024 2048 4096
NI2w 8.49 9.76 11.78 15.55 23.03 40.24 64.62 112.59 208.76 404.10
CNI4 8.31 8.43 9.28 11.73 14.46 20.93 30.96 49.03 84.25 157.70
CNI16Q 6.98 7.03 7.24 9.24 11.73 17.29 23.71 39.08 74.63 126.97
CNI512Q 6.60 6.68 6.89 8.93 11.43 16.67 24.41 38.09 75.30 155.45

Raw data for Figure 7. The following three tables show the process-to-process message bandwidth (in megabytes/second) for different messages sizes (in bytes).
Network
Interface
Cache Bus
8 16 32 64 128 256 512 1024 2048 4096
NI2w 12.71 23.71 41.83 72.32 109.38 111.50 127.00 137.57 143.10 143.07
Network
Interface
Coherent Memory Bus
8 16 32 64 128 256 512 1024 2048 4096
NI2w 6.55 11.28 17.59 24.80 31.21 31.42 33.50 34.74 35.36 35.36
CNI4 8.42 16.33 31.07 30.78 50.22 64.00 72.00 76.17 76.68 78.25
CNI16Q 7.77 15.12 28.92 45.99 73.49 78.64 90.31 100.82 105.56 106.10
CNI512Q 9.66 18.86 35.60 53.46 80.92 84.53 92.37 102.24 106.32 106.52
CNI16Qm 10.45 20.39 38.30 55.79 73.72 82.73 84.69 93.10 94.76 95.20
CNI16Qm
+snarf
10.45 20.37 38.33 57.85 91.46 102.65 121.02 131.59 136.34 137.83
Network
Interface
Coherent I/O Bus
8 16 32 64 128 256 512 1024 2048 4096
NI2w 4.40 7.41 11.28 15.54 18.83 18.91 20.00 20.63 20.94 20.94
CNI4 3.83 7.57 14.70 25.83 47.41 41.03 49.15 54.03 57.51 56.76
CNI16Q 5.65 11.10 21.40 29.32 56.06 54.72 57.02 63.48 79.05 66.39
CNI512Q 6.63 12.99 24.88 34.67 52.37 58.73 61.53 79.55 78.25 81.03

Raw data for Figure 8. The following three tables show the execution cycles for our five macrobenchmarks (on a 16 node multiprocessor).
Network
Interface
Cache Bus
spsolve gauss em3d moldyn appbt
NI2w 24663 77490187 15424763 168341000 172930000
Network
Interface
Coherent Memory Bus
spsolve gauss em3d moldyn appbt
NI2w 37879 116880042 22175848 246996000 235189000
CNI4 33393 84229341 19280920 173766000 213899000
CNI16Q 32073 80326394 19607484 173439000 202127000
CNI512Q 30529 80451588 15166338 172962000 203914000
CNI16Qm 25559 79232341 14494230 169393000 201925000
Network
Interface
Coherent I/O Bus
spsolve gauss em3d moldyn appbt
NI2w 52325 164149614 27762244 312592000 290393000
CNI4 43233 112528521 23543961 260576000 262071000
CNI16Q 37727 89413380 21147542 270817000 225668000
CNI512Q 33434 87437650 16185361 180005000 223538000

Last Updated: Wed Mar 6 12:45:08 CST 1996 by Shubhendu S. Mukherjee