Summary of Sparc Architectures

hyperSPARC
- 2-way superscalar
- 1 ALU
- 1 LSU
- 1 FPU adder and 1 FPU multiplier, but only 1 floating point instruction can be issued per cycle.
- ALU instructions produce a value that can be used in the next cycle
- The SETHI instruction produces a value which can be used by the following instruction in the same cycle.
- Stores use the LSU for 2 cycles.
- Loads use the LSU for one cycle but the loaded value is available in the cycle after next.
- The FPU adder typically takes 3 cycles to complete.
- The FPU may take a lot more.
SuperSPARC
- 3-way superscalar
- 2 ALU
- 1 FPU adder and 1 FPU multiplier, but only 1 floating point instruction can be issued per cycle.
- ALU instructions produce a value that can be used in the same cycle, using some weird cascaded execution scheme.
- Loaded values are available in the next cycle.
- The FPU adder typically takes 3 cycles to complete.
- The FPU may take a lot more.
UltraSPARC
- 4-way superscalar
- 2 ALU
- 1 LSU
- 2 FPU (can issue 2 float instructions per cycle)
- Non-blocking loads allow execution to continue even when loads miss in the 1st level cache. The second level cache latency is ~7 cycles.
- Uses branch prediction, and will include instruction from a predicted branch target to be part of an earlier launch group