| U. Wisconsin CS/ECE 752<br>Advanced Computer Architecture I                                                                                                                     |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Prof. David A. Wood                                                                                                                                                             |
| Unit 6: Dynamic Scheduling II                                                                                                                                                   |
| Slides developed by Amir Roth of University of Pennsylvania<br>with sources that included University of Wisconsin slides by<br>Mark Hill, Guri Sohi, Jim Smith, and David Wood. |
| Slides enhanced by Milo Martin, Mark Hill, and David Wood<br>with sources that included Profs. Asanovic, Falsafi, Hoe,<br>Lipasti, Shen, Smith, Sohi, Vijaykumar, and Wood      |
| CS/ECE 752 (Wood): Dynamic Scheduling II 1                                                                                                                                      |



| Superscalar + Out-of-Order + Specu                                                 | ulation |
|------------------------------------------------------------------------------------|---------|
| <ul> <li>Three great tastes that taste great together</li> <li>CPI ≥ 1?</li> </ul> |         |
| Go superscalar                                                                     |         |
| <ul> <li>Superscalar increases RAW hazards?</li> </ul>                             |         |
| Go out-of-order (OoO)                                                              |         |
| RAW hazards still a problem?                                                       |         |
| Build a larger window                                                              |         |
| Branches a problem for filling large window?     Add control speculation           |         |
|                                                                                    |         |
| CS/ECE 752 (Wood): Dynamic Scheduling II                                           | 3       |

| d (abort)<br><b>anch</b> |
|--------------------------|
|                          |
|                          |
| anch                     |
|                          |
|                          |
| cks                      |
|                          |
| writeback                |
|                          |
| problem                  |
|                          |

| • | Speculative execution requires                                                            |
|---|-------------------------------------------------------------------------------------------|
|   | (Ability to) abort & restart at every branch                                              |
|   | <ul> <li>Abort &amp; restart at every load useful for load speculation (later)</li> </ul> |
|   | <ul> <li>And for shared memory multiprocessing (much later)</li> </ul>                    |
| • | Precise synchronous (program-internal) interrupts require                                 |
|   | Abort & restart at every load, store, ??                                                  |
| • | Precise asynchronous (external) interrupts require                                        |
|   | Abort & restart at every ??                                                               |
| • | Bite the bullet                                                                           |
|   | Implement abort & restart at every insn                                                   |
|   | Called "precise state"                                                                    |

| • | Imprecise state: ignore the problem!                                                 |
|---|--------------------------------------------------------------------------------------|
|   | <ul> <li>Makes page faults (any restartable exceptions) difficult</li> </ul>         |
|   | <ul> <li>Makes speculative execution almost impossible</li> </ul>                    |
|   | IEEE standard strongly suggests precise state                                        |
|   | <ul> <li>Compromise: Alpha implemented precise state only for integer ops</li> </ul> |
| • | Force in-order completion (W): stall pipe if necessary                               |
|   | - Slow                                                                               |
| • | Precise state in software: trap to recovery routine                                  |
|   | <ul> <li>Implementation dependent</li> </ul>                                         |
|   | Trap on every mis-predicted branch (you must be joking)                              |
| • | Precise state in hardware                                                            |
|   | + Everything is better in hardware (except policy)                                   |







| ROB makes register writes in-order, but what a                                                                     | bout stores?   |
|--------------------------------------------------------------------------------------------------------------------|----------------|
| <ul> <li>As usual, i.e., write to D\$ in X stage?</li> </ul>                                                       |                |
| <ul> <li>Not even close, imprecise memory worse than imprecise</li> <li>Especially in a multiprocessor!</li> </ul> | cise registers |
| Load/store queue (LSQ)                                                                                             |                |
| Completed stores write to LSQ                                                                                      |                |
| <ul> <li>When store retires, write head of LSQ to D\$</li> </ul>                                                   |                |
| When loads execute, access LSQ and D\$ in parallel                                                                 |                |
| <ul> <li>Forward from LSQ if older store with matching ad</li> </ul>                                               | dress          |
| <ul> <li>More modern design: loads and stores in separate qu</li> </ul>                                            | leues          |
| More on this later                                                                                                 |                |



| <ul> <li>P6: Start with Tomasulo's algorithm add Re</li> <li>Separate ROB and RS</li> </ul> | ОВ         |
|---------------------------------------------------------------------------------------------|------------|
| • Simple-P6                                                                                 |            |
| • Our old RS organization: 1 ALU, 1 load, 1 store, 2                                        | 3-cycle FP |
|                                                                                             |            |
|                                                                                             |            |
|                                                                                             |            |
|                                                                                             |            |
|                                                                                             |            |
| CS/ECE 752 (Wood): Dynamic Scheduling II                                                    | 12         |

| <ul> <li>Reservation St</li> </ul>      | ations are same as before         |  |
|-----------------------------------------|-----------------------------------|--|
| ROB                                     |                                   |  |
| <ul> <li>head, tail: p</li> </ul>       | ointers maintain sequential order |  |
| <ul> <li>R: insn outpu</li> </ul>       | t register, V: insn output value  |  |
| <ul> <li>Tags are difference</li> </ul> | rent                              |  |
| <ul> <li>Tomasulo: RS</li> </ul>        | 6# → P6: ROB#                     |  |
| <ul> <li>Map Table is c</li> </ul>      | lifferent                         |  |
| <ul> <li>T+: tag + "re</li> </ul>       | eady-in-ROB" bit                  |  |
| <ul> <li>T==0 → Valu</li> </ul>         | ie is ready in regfile            |  |
| <ul> <li>T!=0 → Value</li> </ul>        | e is not ready                    |  |
| <ul> <li>T!=0+ → Val</li> </ul>         | ue is ready in the ROB            |  |



| RC | )B     | 3 2     |        |   | <u> </u> | - | 1 3 | 3 3 | Map Table | CDB |
|----|--------|---------|--------|---|----------|---|-----|-----|-----------|-----|
|    | # In   | sn      |        | R | V        | S | Х   | С   | Reg T+    | TV  |
|    | 1 1d   | f X(r   | 1),f1  |   |          | - |     |     | f0        |     |
| -  |        |         | ,f1,f  | 2 |          |   |     | 11  | f1        |     |
|    | 3 st   | f f2,   | Z(r1)  |   |          |   | +   |     | f2        |     |
|    |        |         | ,4,r1  |   |          |   |     |     | r1        |     |
|    |        |         | 1),f1  |   |          |   |     |     |           |     |
|    |        |         | ,f1,f  | 2 |          |   |     |     |           |     |
| _  | 7 st   | f f2,   | Z(r1)  |   |          | _ |     |     |           |     |
|    |        |         |        |   |          |   |     |     |           |     |
| Re | servat | ion Sta | ations |   |          |   |     |     |           |     |
| #  | FU     | busy    | ор     | т | T1       | T | 2   | V1  | V2        |     |
| 1  | ALU    | no      |        |   | -        |   |     | ++  |           |     |
| 2  | LD     | no      |        |   |          |   |     |     |           |     |
| 3  | ST     | no      |        |   |          |   |     |     |           |     |
| 4  | FP1    | no      |        |   |          |   |     |     |           |     |
| 5  | FP2    | no      |        |   |          |   |     |     |           |     |

| New pipeline structure: F, D, S, X, C, R                      |                  |
|---------------------------------------------------------------|------------------|
| D (dispatch)                                                  |                  |
| <ul> <li>Structural hazard (ROB/LSQ/RS) ? Stall</li> </ul>    |                  |
| <ul> <li>Allocate ROB/LSQ/RS</li> </ul>                       |                  |
| <ul> <li>Set RS tag to ROB#</li> </ul>                        |                  |
| <ul> <li>Set Map Table entry to ROB# and clear "re</li> </ul> | eady-in-ROB" bit |
| <ul> <li>Read ready registers into RS (from either</li> </ul> | ROB or Regfile)  |
| X (execute)                                                   |                  |
| Free RS entry                                                 |                  |
| Use to be at W, can be earlier because RS                     | # are not tags   |
|                                                               |                  |
|                                                               |                  |
|                                                               |                  |
|                                                               |                  |
| S/ECE 752 (Wood): Dynamic Scheduling II                       | 16               |

| • C | (complete)                                                      |
|-----|-----------------------------------------------------------------|
|     | Structural hazard (CDB)? wait                                   |
|     | Write value into ROB entry indicated by RS tag                  |
|     | Mark ROB entry as complete                                      |
|     | If not overwritten, mark Map Table entry "ready-in-ROB" bit (+) |
| • R | (retire)                                                        |
|     | Insn at ROB head not complete ? stall                           |
|     | Handle any exceptions                                           |
|     | Write ROB head value to register file                           |
|     | If store, write LSQ head to D\$                                 |
|     | Free ROB/LSQ entries                                            |
|     |                                                                 |
|     |                                                                 |











|                       | )B             |        | 2   | 4      |              |    |     |     |      | Man | Table | CDB          |
|-----------------------|----------------|--------|-----|--------|--------------|----|-----|-----|------|-----|-------|--------------|
| _                     | _              | Insn   |     |        | R            | /  | S   | X   | С    | Reg |       | TV           |
| h                     |                |        | K(r | 1),f1  | f1           |    | c2  |     |      | f0  |       |              |
| t                     |                |        |     | ,f1,f2 | 2 <b>f</b> 2 |    |     |     |      |     | ROB#1 |              |
|                       |                | stf i  |     |        |              |    |     |     |      |     | ROB#2 |              |
|                       | 4              | addi   | r1  | ,4,r1  |              |    |     |     |      | r1  |       |              |
|                       | 5 ldf X(r1),f1 |        |     |        |              |    |     |     |      |     |       |              |
|                       | 6              | mulf   | £0  | ,f1,f2 | 2            |    |     |     |      |     |       |              |
|                       | 7              | stf i  | E2, | Z(r1)  |              |    |     |     |      |     |       | $\mathbf{i}$ |
|                       |                |        |     |        |              |    |     |     |      |     |       |              |
| Re                    | ser            | vation | Sta | ations |              |    |     |     |      |     |       |              |
|                       | FI             |        | JSV |        | Т            | T1 | T2  |     | V1   | V   | 2     | set ROB# tag |
| #                     | A              |        | _   |        | -            |    |     |     |      |     | ·     |              |
| _                     | -              | ) ve   | es  | ldf    | ROB#1        |    |     |     |      | -   | r1]   |              |
| 1                     | L              |        | 2   | 1      |              |    | _   |     |      |     |       |              |
| 1<br>2                | LI<br>S        | r no   |     |        |              |    |     |     |      |     |       |              |
| #<br>1<br>2<br>3<br>4 | -              |        | es  | mulf   | ROB#2        |    | ROE | 3#1 | [f0] |     |       | allocate     |

| RC     | )B  |     |         | 1.3    |         |       | 1 1 1 | 1 1   | Map Table | CDB      |
|--------|-----|-----|---------|--------|---------|-------|-------|-------|-----------|----------|
|        |     | Ins | sn      |        | R       | V     | s x   | C     | Reg T+    | TV       |
| h      |     |     |         | 1),f1  | f1      |       | 2 c3  |       | f0        |          |
|        |     |     |         | ,f1,f  |         |       |       |       | f1 ROB#1  |          |
| t      |     |     |         | Z(r1)  |         |       |       |       | f2 ROB#2  |          |
|        | 4   | ad  | di r1   | ,4,r1  |         |       |       |       | r1        |          |
|        | 5   | ld  | f X(r   | 1),f1  |         |       |       |       |           | -        |
|        | 6   | mu  | 1f f0   | ,f1,f  | 2       |       |       |       |           |          |
|        | 7   | st  | f f2,   | Z(r1)  |         |       |       |       |           |          |
|        |     |     |         |        |         |       |       |       |           |          |
| R۵     | cor | vət | ion Sta | ations | <u></u> |       |       |       |           |          |
| #      | FI  |     | busy    |        | Т       | T1    | T2    | V1    | V2        |          |
| 1      | -   | LU  | no      |        | ·       |       | 12    | · -   | v 2       |          |
| 2      | L   |     | no      |        |         |       | 1 3   |       |           | free     |
|        | S   |     | yes     | stf    | POB#3   | ROB#2 |       |       | [r1]      | allocate |
| 3      |     |     | yes     |        | ROB#2   |       | ROB#1 | [ff0] | 1         | unocule  |
| 3<br>4 | F   |     |         |        |         |       |       |       |           |          |

|             | <sup>D</sup>   | -                    | 1 1                                          | 1 1                           |           | 3 | - 1 | _   | 1 8         |            | Man Table                                                    | _   |
|-------------|----------------|----------------------|----------------------------------------------|-------------------------------|-----------|---|-----|-----|-------------|------------|--------------------------------------------------------------|-----|
| RC          |                | -                    |                                              |                               | -         |   |     |     | 1.1         | -          | Map Table CDB                                                | _   |
|             | -              | Ins                  |                                              |                               | R         | V | /   | S   | X           | С          | Reg T+ T V                                                   |     |
| h           |                |                      |                                              | 1),f1                         | f1        |   | f1] | c2  |             | c4         | f0 ROB#1 [f]                                                 | 1]  |
|             |                |                      |                                              | ,f1,f                         | 2 f2      | 4 |     | _c4 |             |            | f1 ROB#1+                                                    |     |
|             |                |                      | f f2,                                        |                               |           | + |     |     |             |            | f2 ROB#2                                                     |     |
| t           |                |                      |                                              | ,4,r1                         |           | + |     |     |             | 2 4        | r1 ROB#4                                                     |     |
|             |                |                      |                                              | 1),f1                         |           | + |     |     |             |            | ldf finished                                                 | ~   |
|             | 6              | mu                   | 1f f0                                        | f1 f                          | 2         |   | 1.1 |     |             |            |                                                              |     |
| _           |                |                      |                                              |                               | -         | + |     |     |             |            | <ol> <li>set "ready-in-ROB" t</li> </ol>                     | oit |
| _           | 7              | st                   | f f2,                                        |                               |           | t |     |     |             |            | 2. write result to ROB                                       | oit |
|             | 7              | st:                  |                                              |                               |           |   |     |     |             |            |                                                              | oit |
| Re          | 7<br>ser       |                      | f f2,                                        | Z(r1)                         |           |   |     |     |             |            | 2. write result to ROB                                       | oit |
| -           | -              | vati                 | f f2,<br>ion Sta                             | z (r1)<br>ations              |           |   | т1  |     | [2          | V1         | 2. write result to ROB<br>3. CDB broadcast                   | oit |
| Re<br>#     | F              | vati<br>U            | f f2,<br>ion Sta<br>busy                     | z (r1)<br>ations<br>op        | T         | _ | T1  |     | Г <u>2</u>  | V1         | 2. write result to ROB<br>3. CDB broadcast                   | bit |
| #<br>1      | F              | vati<br>U<br>LU      | f f2,<br>ion Sta<br>busy                     | z (r1)<br>ations              |           | _ | T1  | 1   | Г <u>2</u>  | V1<br>[r1] | 2. write result to ROB<br>3. CDB broadcast                   |     |
| -           | F              | vati<br>U<br>LU<br>D | f f2,<br>ion Sta<br>busy<br>yes              | z (r1)<br>ations<br>op        | T         | 4 |     |     | r2          | _          | 2. write result to ROB<br>3. CDB broadcast<br>V2<br>allocate |     |
| #<br>1<br>2 | FI<br>Al<br>LI | vati<br>U<br>LU<br>D | f f2,<br>ion Sta<br>busy<br>yes<br>no<br>yes | z (r1)<br>ations<br>op<br>add | T<br>ROB# | 4 |     | #2  | Г2<br>ROB#1 | [r1]       | 2. write result to ROB<br>3. CDB broadcast                   |     |

| ĸu                          | )B  |         |             |                         |                |        |       |      | Map Table   | CDB              |
|-----------------------------|-----|---------|-------------|-------------------------|----------------|--------|-------|------|-------------|------------------|
| ht                          | #   | Ins     | n           |                         | R              | / 9    | 5 X   | С    | Reg T+      | TV               |
|                             | 1   | ldf     | X(r         | 1),f1                   | f1             | [f1] c | 2 c3  | c4   | f0          |                  |
| h                           |     |         |             | ,f1,f                   |                | 0      | 4 c5  | -    | f1 ROB#5    |                  |
|                             | 3   | stf     | f2,         | Z(r1)                   |                |        |       |      | f2 ROB#2    |                  |
|                             | 4   | add     | li r1       | ,4,r1                   | r1             | c      | 5     |      | r1 ROB#4    |                  |
| t                           | 5   | ldf     | X(r         | 1),f1                   | <b>f1</b>      |        |       |      | Idf retires |                  |
|                             | 6   | mul     | f f0        | ,f1,f                   | 2              |        |       |      | 1. write RO | B result to regf |
|                             | 7   | stf     | f2,         | Z(r1)                   |                |        |       |      |             |                  |
|                             |     |         |             |                         |                |        |       |      |             |                  |
|                             | cor | vati    | on Sta      | ations                  |                | 5 5    | 1 1 8 |      |             |                  |
| Re                          |     |         |             |                         | Т              | T1     | T2    | V1   | V2          |                  |
| -                           | -   | J       | DUSV        | IOD                     |                |        |       |      |             |                  |
| #                           | Fl  | -       | busy<br>ves | op<br>add               | I<br>ROB#4     |        |       |      |             |                  |
| #                           | Fl  | LŪ      |             |                         |                |        | ROB#4 | [r1] |             | allocate         |
| #<br>1<br>2                 | Fl  | D<br>D  | yes         | add                     | ROB#4<br>ROB#5 |        |       |      |             | allocate         |
| Re<br>#<br>1<br>2<br>3<br>4 | Fl  | LU<br>D | yes<br>yes  | add<br><mark>1df</mark> | ROB#4<br>ROB#5 |        |       |      |             | allocate         |

| RC | )B     | \$ <u>}</u> | 3      |             | 1 3     | 1 : 2              | 3 1    | Map Table | CDB      |
|----|--------|-------------|--------|-------------|---------|--------------------|--------|-----------|----------|
| ٦t | # In   | sn          |        | R           | V       | S X                | C      | Reg T+    | ΤV       |
|    | 1 1d   | f X(r       | 1),f1  | f1          | [f1] (  | c2 c3              | c4     | fO        | -        |
| h  | 2 mu   | 1f f0       | ,f1,f2 | 2 f2        |         | c4 c5+             | -      | f1 ROB#5  |          |
|    |        | f f2,       |        |             |         |                    |        | f2 ROB#6  |          |
|    |        |             | ,4,r1  | <b>r1</b>   |         | c5 <mark>c6</mark> |        | r1 ROB#4  |          |
|    |        |             | 1),f1  | f1          |         | _                  |        |           |          |
| t  |        |             | ,f1,f2 | 2 <b>f2</b> |         |                    | 2 3    |           |          |
| _  | 7 st   | f f2,       | Z(r1)  |             |         |                    |        |           |          |
|    |        |             |        |             |         |                    |        |           |          |
| Re | servat | ion St      | ations |             |         |                    |        |           |          |
| #  | FU     | busy        | ор     | Т           | T1      | T2                 | V1     | V2        |          |
| 1  | ALU    | no          |        |             |         |                    |        |           | free     |
| 2  | LD     | yes         | ldf    | ROB#5       | 5       | ROB#4              | L      |           | ]        |
| 3  | ST     | yes         | stf    | ROB#3       | B ROB#2 | 2                  |        | [r1]      |          |
| 4  | FP1    | yes         | mulf   | ROB#6       | 5       | ROB#5              | 6 [f0] |           | allocate |
| 5  | FP2    | no          |        |             |         |                    |        |           |          |

| RC | )B  | :   | 3       |        |      |      |     |       |        | М   | ap Table     | CDB         |
|----|-----|-----|---------|--------|------|------|-----|-------|--------|-----|--------------|-------------|
| ht |     | Ins | sn      |        | R    | V    |     | S X   | C      |     | eg T+        | TV          |
|    | 1   | ld  | f X(r   | 1),f1  | f1   | [f1  | ] c | 2 c3  | c4     | f   | )            | ROB#4 [r1]  |
| h  | 2   | mu  | 1f f0   | ,f1,f  | 2 f2 |      | c   | 4 c5+ | +      | f1  | ROB#5        |             |
|    | 3   |     |         | Z(r1)  |      |      |     |       |        | f2  |              |             |
|    | 4   |     |         | ,4,r1  | r1   | [r1  |     | :5 c6 | c7     | 1   | ROB#4+       | •]          |
|    | 5   |     |         | 1),f1  | f1   |      | c   | :7    |        |     |              |             |
| t  |     |     |         | ,f1,f  | 2 f2 |      | _   |       |        |     |              |             |
| _  | 7   | st  | f f2,   | Z(r1)  | _    |      |     |       |        | sta | ll D (no fre | e STRS)     |
|    |     |     |         |        |      |      |     |       |        |     |              |             |
| Re | ser | vat | ion Sta | ations |      |      |     |       |        |     |              |             |
| #  | F   |     | busy    |        | Т    | T1   |     | T2    | V1     |     | V2           |             |
| 1  | A   | LU  | no      |        |      |      |     |       |        |     |              | ROB#4 ready |
| 2  | L   | D   | yes     | ldf    | ROB# | 5    |     | ROB#4 |        |     | CDB.V        | grab CDB.V  |
| 3  | S   | т   | yes     | stf    | ROB# | 3 RO | в#2 |       |        |     | [r1]         | grab obb.t  |
| 4  | F   | Р1  | yes     | mulf   | ROB# | 6    |     | ROB#5 | 5 [f0] | ]   |              |             |
| 5  | 12  | P2  | no      |        |      |      |     |       | 1      |     |              |             |

|                             |                            |              |                                   |           |      |       | _          | 1 6       | 3 1  |      |          | <b>T</b> 1 1    |                             |
|-----------------------------|----------------------------|--------------|-----------------------------------|-----------|------|-------|------------|-----------|------|------|----------|-----------------|-----------------------------|
| RC                          | _                          | -            |                                   |           | -    |       | _          |           | -    |      |          | Table           | CDB                         |
| ht                          | -                          | Ins          | n                                 |           | R    | V     | S          | Х         | С    |      | Reg      | T+              | ΤV                          |
|                             | 1                          |              |                                   | 1),f1     | f1   |       | c2         | c3        | c4   | 1 13 | 0        |                 | ROB#2 [f2]                  |
| h                           | 2                          |              |                                   | ,f1,f     | 2 f2 | [f2]  | c4         | c5+       | c8   |      | 1        | ROB#5           |                             |
|                             | 3                          |              |                                   | Z(r1)     |      |       | <b>c</b> 8 |           |      |      | 2        | ROB#6           |                             |
|                             | 4                          |              |                                   | ,4,r1     | r1   | [r1]  |            | c6        | c7   |      |          | ROB#4+          |                             |
|                             | 5                          |              |                                   | 1),f1     | f1   |       | c7         | <b>c8</b> |      |      |          | t for add       |                             |
| t                           |                            |              |                                   | ,f1,f     | 2 f2 |       |            |           |      | (in  | -ore     | der comi        | nit)                        |
|                             | 7                          | sti          | £ £2,                             | Z(r1)     |      |       |            |           |      |      |          | D.40 1          | 11 - 1 - 1 - 1              |
| _                           |                            |              | /                                 | 5(11)     | _    | _     |            |           |      |      | RO       | B#2 INVa        | ilid in MapTable            |
|                             |                            |              |                                   | - ( /     |      |       |            |           |      |      |          |                 | eady-in-ROB                 |
| Ro                          | cor                        | vati         |                                   |           |      |       |            |           |      |      |          |                 |                             |
|                             |                            |              | on Sta                            | ations    | T    | T1    | Т          | 2         | V/1  |      | dor      | n't set "r      |                             |
| #                           | F                          | J            | on Sta<br>busy                    | ations    | T    | T1    | Т          | 2         | V1   |      |          | n't set "r      |                             |
| #<br>1                      | F                          | na<br>1      | on Sta<br>busy<br>no              | ations    | T    | T1    | Т          | 2         | V1   |      | dor      | n't set "r      |                             |
| #<br>1<br>2                 | F                          | D<br>TA<br>T | on Sta<br>busy<br>no<br>no        | op        |      |       |            | 2         |      |      | doi<br>V | n't set "r<br>2 |                             |
| #<br>1<br>2<br>3            | F<br>A<br>L<br>S           | с<br>ГО<br>Г | on Sta<br>busy<br>no<br>no<br>yes | op<br>stf | ROB# | 3 ROB | #2         |           | [f2] |      | doi<br>V | n't set "r      | eady-in-ROB"                |
| Re<br>#<br>1<br>2<br>3<br>4 | FI<br>Al<br>LI<br>S'<br>FI | с<br>ГО<br>Г | on Sta<br>busy<br>no<br>no        | op<br>stf |      | 3 ROB | #2         | 2<br>ов#5 | [f2] |      | doi<br>V | n't set "r<br>2 | eady-in-ROB"<br>ROB#2 ready |

| RC          | Ж          |              |                       |        |           |       |           |                  |    | 1   | Мар  | Tabl  | е           | CDB                                           |
|-------------|------------|--------------|-----------------------|--------|-----------|-------|-----------|------------------|----|-----|------|-------|-------------|-----------------------------------------------|
| ht          | #          | Ins          | sn                    |        | R         | ٧     | S         | Х                | С  | F   | Reg  | T+    |             | ТV                                            |
|             | 1          | ld           | f X(r                 | 1),f1  | f1        | [f1]  | c2        | c3               | c4 |     | E0   |       |             | ROB#5 [f1]                                    |
|             | 2          | mu           | lf f0                 | ,f1,f  | 2 f2      | [f2]  | c4        | c5+              | c8 | 1   | £1   | ROB   | ‡5 <b>+</b> |                                               |
| h           |            |              | f f2,                 |        |           |       | c8        |                  |    |     | E2   | ROB   |             |                                               |
|             | 4          |              |                       | ,4,r1  | r1        | [r1]  | c5        | c6               | c7 |     | :1   | ROB   | \$4+        |                                               |
|             |            |              |                       | 1),f1  | f1        | [f1]  | -         |                  | c9 | ret | tire | mulf  |             |                                               |
| t           |            |              | <u>lf f0</u><br>f f2, | ,f1,f  | 2 £2      |       | <b>c9</b> |                  |    |     |      |       |             |                                               |
| -           |            |              | /                     | /      |           |       |           | 1 5              | -  | an  | hih  | e sta | yea         | active at once                                |
|             |            |              |                       |        |           |       |           |                  |    |     |      |       |             |                                               |
| Re          | ser        | vat          | ion Sta               | ations |           |       |           |                  |    |     |      |       |             |                                               |
| Re<br>#     | ser<br> Fl |              | ion Sta               |        | Т         | T1    | 1         | -2               | V1 |     | V    | 2     |             |                                               |
|             |            | J            |                       |        | Т         | T1    | 1         | -2               | V1 |     | V    | 2     |             |                                               |
| #           | Fl         | ۲۵<br>۲      | busy                  |        | T         | T1    | 1         | 2                | V1 |     | V    | 2     |             |                                               |
| #<br>1      | Fl         | D<br>TO<br>T | busy<br>no            | ор     | T<br>ROB# |       | #6        |                  |    |     |      |       | . v         | free, re-allocat                              |
| #<br>1<br>2 | FU         |              | busy<br>no<br>no      | ор     | ROB#      | 7 ROB | #6        | -2<br>2<br>80B#5 |    |     | R    |       | . v         | free, re-allocat<br>ROB#5 ready<br>grab CDB.V |

|        | _   | _    |        |        |    |     |      | _  |       |     |            |         |      |   |
|--------|-----|------|--------|--------|----|-----|------|----|-------|-----|------------|---------|------|---|
| RC     |     |      |        |        |    |     |      |    |       |     |            | Table   | CDE  |   |
| ht     | #   | Ins  | n      |        | F  | र । | V    | S  | X     | C   | Rec        | 1 T+    | Т    | V |
|        | 1   | ldi  | E X(r  | 1),f1  |    | f1  | [f1] | c  | 2 c3  | c4  | f0         |         |      |   |
|        | 2   | mul  | Lf f0  | ,f1,f  | 2  | £2  | [£2] | C  | 1 c5+ | c8  | f1         | ROB#5+  |      |   |
| h      | 3   | st   | £ £2,  | Z(r1)  |    |     |      | c  | 3 c9  | c10 | f2         | ROB#6   |      |   |
|        |     |      |        | ,4,r1  |    |     | [r1] | c! |       | c7  | <b>r</b> 1 | ROB#4+  |      |   |
|        |     |      |        | 1),f1  |    |     | [f1] | c' |       | c9  |            |         |      |   |
|        |     |      |        | ,f1,f  | 2  | £2  |      | C  | c10   |     |            |         |      |   |
| t      | 7   | st   | E £2,  | Z(r1)  |    |     |      |    |       |     |            |         |      |   |
|        |     |      |        |        |    |     |      |    |       |     |            |         |      |   |
| R0     | cor | vati | on Sta | ations | -  | 5 3 | 1 1  | -  | 1.5   | 1 1 | 1 1        | 1 1 1   |      |   |
| #      | TFL |      | busy   |        | т  |     | T1   | -  | T2    | V1  | \          | /2      |      |   |
| #<br>1 | AI  |      | no     | υρ     |    |     | 11   | -  | 12    | VI  |            | 12      |      |   |
| 2      | LI  |      | no     |        |    |     |      | -  |       |     |            |         |      |   |
| 3      | ST  |      | yes    | stf    | BO | B#7 | ROB  | #6 |       |     | T          | ROB#4.V |      |   |
| 4      | FF  |      | no     | 0.01   |    | ~~~ | 1.00 |    |       |     | -          |         | free |   |
|        |     | 2    | no     |        |    |     |      |    |       |     | -          |         |      |   |

|    |     |     |         |        | _        |            |      | _  |      |     |       | <b>T</b> 1 1 | CDD |   |
|----|-----|-----|---------|--------|----------|------------|------|----|------|-----|-------|--------------|-----|---|
| RO |     |     |         |        |          | _          |      |    |      |     |       | o Table      | CDB | _ |
| ht | #   | Ins | sn      |        |          | R          | V    | S  | X    | C   | Rec   | 1 T+         | Т   | V |
|    | 1   | ld  | f X(r   | 1),f1  |          |            | [f1] | c  | 2 c3 | c4  | f0    |              |     |   |
|    | 2   | mu  | 1f f0   | ,f1,f  | 2        | £2         | [f2] | C  | l c5 | c8  | f1    | ROB#5+       |     |   |
|    | 3   | st  | f f2,   | Z(r1)  |          |            |      | c  | 3 c9 | c10 | f2    | ROB#6        |     |   |
| h  | 4   | ad  | di r1   | ,4,r1  | .        | <b>r</b> 1 | [r1] | c! | 5 c6 | c7  | r1    | ROB#4+       |     |   |
|    | 5   | ld  | f X(r   | 1),f1  |          | f1         | [f1] | c' | 7 c8 | c9  |       | re stf       |     |   |
|    | 6   | mu  | 1f f0   | ,f1,f  | 2        | £2         |      | c  | c10  |     | retil | esu          |     |   |
| t  | 7   | st  | f f2,   | Z(r1)  |          |            |      |    |      |     |       |              |     |   |
|    |     |     |         |        |          |            |      |    |      |     |       |              |     |   |
| Re | ser | vat | ion Sta | ations | -        | 1 1        |      | _  | 1 3  |     | 1 3   |              |     |   |
| #  | FI  |     | busy    |        | Т        |            | T1   |    | T2   | V1  |       | /2           |     |   |
| 1  |     | LU  | no      |        | <u> </u> |            |      |    |      |     |       |              |     |   |
| 2  | L   | D   | no      |        |          |            |      |    |      |     |       |              |     |   |
| 3  | S   | г   | yes     | stf    | RC       | )B#7       | ROB  | #6 |      |     | F     | ROB#4.V      |     |   |
| 4  | F   | P1  | no      |        |          |            |      |    |      |     |       |              |     |   |
| 5  | -   | P2  | no      |        |          |            |      |    |      |     |       |              |     |   |

|   | bint of ROB is maintaining precise state                                                                                                                       |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | How does that work?<br>Easy as 1,2,3                                                                                                                           |
|   | <ol> <li>Wait until last good insn retires, first bad insn at ROB head</li> <li>Clear contents of ROB, RS, and Map Table</li> </ol>                            |
|   | 3. Start over                                                                                                                                                  |
| • | <ul> <li>Works because zero (0) means the right thing</li> <li>0 in ROB/RS → entry is empty</li> <li>Tag == 0 in Map Table → register is in regfile</li> </ul> |
| • | and because regfile and D\$ writes take place at R                                                                                                             |
| • | Example: page fault in first stf                                                                                                                               |
|   |                                                                                                                                                                |

| RC     |     |      |         |                 |       |       |    |       | 1 1  | Mar | o Table | CDB        |
|--------|-----|------|---------|-----------------|-------|-------|----|-------|------|-----|---------|------------|
| ht     |     | Ins  | 'n      |                 | R     | V     | 6  | S X   | С    |     |         | TV         |
| ΠL     | -   | -    |         | 1),f1           | f1    | (f1]  | c  |       | c4   | f0  | 117     | ROB#5 [f1] |
|        | +   |      |         | ,f1,f           |       | [f2]  |    | 4 c5+ | -    | f1  | ROB#5+  | KOB#5[[II] |
| h      |     |      |         | /11,1.<br>Z(r1) | 2 12  | [12]  |    | 8 c9  |      | f2  | ROB#5+  |            |
|        |     |      |         | ,4,r1           | r1    | [r1]  |    | 5 c6  | C7   | r1  | ROB#4+  |            |
|        |     |      |         | 1),f1           |       | [f1]  | -  | 7 c8  | c9   |     | 1       | •          |
|        |     |      |         | ,f1,f           |       |       | С  | 9     |      |     |         |            |
| t      |     |      | f f2,   |                 |       |       |    |       |      |     |         |            |
|        |     |      |         |                 |       |       |    |       |      |     | PAGE    | FAULT      |
| Do     | cor | vəti | ion Sta | ations          |       |       |    |       | 1 1  |     |         |            |
| #      | FI  |      | busy    |                 | т     | T1    | -  | T2    | V1   |     | 12      |            |
| π<br>1 | A   |      | no      | οp              |       | 11    | -  | 12    | V I  |     | 2       |            |
| 2      | L   |      | no      |                 |       | -     |    |       |      |     |         |            |
| 3      | S   |      | ves     | stf             | ROB#7 | 7 ROB | #6 |       |      | F   | ROB#4.V |            |
| 4      | F   |      | ves     | mulf            |       |       |    | ROB#5 | [£0] |     | DB.V    |            |
| 5      | F   | P2   | no      |                 |       |       |    |       |      |     |         |            |

| RO | В   |                                       |       |                       |   |    |      |    |     |     | Map Table CDB              |
|----|-----|---------------------------------------|-------|-----------------------|---|----|------|----|-----|-----|----------------------------|
| ht | #   | Insn                                  |       |                       | 1 | R  | V    | S  | X   | С   | Reg T+ T V                 |
|    | 1   | ldf                                   | X(r   | 1),f1                 |   | f1 | [f1] | c2 | c3  | c4  | f0                         |
|    |     |                                       |       | ,f1,f                 |   | £2 | [f2] | c4 | c5+ | c8  | f1                         |
|    |     |                                       |       | Z(r1)                 |   |    |      |    |     |     | f2                         |
|    |     |                                       |       | ,4,r1                 |   | _  |      |    |     |     | <u>r1</u>                  |
|    |     | <pre>ldf X(r1),f1 mulf f0,f1,f2</pre> |       |                       |   | _  |      |    |     |     |                            |
|    |     |                                       |       | <u>,fl,f</u><br>Z(r1) | 2 | 2  |      |    |     |     | faulting insn at ROB head? |
| -  | /   | Sti                                   | 12,   | 2(f1)                 | - | 1  |      |    | 1.1 | 1 1 | CLEAR EVERYTHING           |
| -  |     |                                       |       |                       |   |    |      |    |     |     |                            |
| Re | ser | vatio                                 | n Sta | ations                |   |    |      |    |     |     |                            |
| #  | FL  | J b                                   | usy   | ор                    | Т |    | T1   | T  | 2   | V1  | V2                         |
| 1  | AL  | U n                                   | 0     |                       |   |    |      |    |     |     |                            |
| 2  | LD  | n                                     | 0     |                       |   |    |      |    |     | /   |                            |
| 3  | ST  |                                       | 0     |                       |   |    |      |    |     |     |                            |
| 4  | FP  |                                       | 0     |                       |   |    |      | _  |     |     |                            |
| 5  | FP  | 2 n                                   | 0     |                       |   |    |      |    |     |     |                            |

| RC | )B   |           |        |       |      |    |     |      | Map Table CDB               |
|----|------|-----------|--------|-------|------|----|-----|------|-----------------------------|
| ht | #    | Insn      |        | R     | V    | S  | Х   | С    | Reg T+ T V                  |
|    | 1    | ldf X(r   | 1),f1  | f1    | [f1] | c2 | c3  | c4   | f0                          |
|    | 2 1  | mulf f0   | ,f1,f  | 2 f2  | [f2] | c4 | c5+ | c8   | f1                          |
| ht |      | stf f2,   |        |       |      |    |     |      | f2                          |
|    |      | addi r1   |        |       |      |    |     |      | r1                          |
|    |      | ldf X(r   |        |       |      |    |     |      |                             |
|    |      | mulf f0   |        | 2     |      |    |     |      | START OVER                  |
| _  | 7    | stf f2,   | Z(r1)  |       | _    |    |     |      | (after OS fixes page fault) |
|    |      |           |        |       |      |    |     |      |                             |
| Re | serv | ation Sta | ations |       |      |    |     |      |                             |
| #  | FU   |           |        | Т     | T1   | Т  | 2   | V1   | V2                          |
| 1  | AL   |           |        | -     |      | -  |     |      |                             |
| 2  | LD   | no        |        |       |      |    |     |      |                             |
| 3  | ST   | yes       | stf    | ROB#3 | 3    |    |     | [f4] | [r1]                        |
| 4  | FP   | 1 no      |        |       |      |    |     |      |                             |
| 5  | FP   | 2 no      |        |       |      |    |     |      |                             |

|    |     | -       |         |        |      |      |     | 1.3 | 1.1  | Mars Table | CDD |
|----|-----|---------|---------|--------|------|------|-----|-----|------|------------|-----|
| RC |     | -       |         |        |      |      | -   |     | -    | Map Table  | CDB |
| ht | -   | Ins     |         |        | R    | V    | S   | X   | С    | Reg T+     | T V |
|    |     |         | f X(r   |        |      | [f1] |     | c3  | c4   | fO         |     |
|    |     |         | 1f f0   |        | 2 f2 | [f2] |     | c5+ | c8   | f1         |     |
| h  |     |         | f f2,   |        |      |      | c12 |     |      | f2         |     |
| t  |     |         | di r1   |        |      |      |     |     |      | r1 ROB#4   |     |
|    |     |         | f X(r   |        |      |      |     |     |      |            |     |
|    |     |         | 1f f0   |        | 2    |      |     |     |      |            |     |
|    | 7   | st      | f f2,   | Z(r1)  |      |      |     |     |      |            |     |
|    |     |         |         |        |      |      |     |     |      |            |     |
| Re | ser | vat     | ion Sta | ations |      |      |     |     |      |            |     |
| #  | FI  |         | busy    |        | Т    | T1   |     | 2   | V1   | V2         |     |
| 1  | AJ  | -<br>LU |         | addi   | ROB# | _    |     |     | [r1] |            |     |
| 2  | LI  |         | no      |        |      | -    |     |     |      |            |     |
| 3  | S   | r       | yes     | stf    | ROB# | 3    |     |     | [f4] | [r1]       |     |
|    | FI  | P1      | no      |        |      |      |     |     |      |            |     |
| 4  |     |         |         |        |      |      | _   |     |      |            |     |



| • (Relat<br>• Ar<br>• Ju | design for a while<br>ively) easy to implement correctly<br>ything goes wrong (mispredicted branch, fault, interrupt)?<br>st clear everything and start again<br>bles: Intel PentiumPro, IBM/Motorola PowerPC, AMD K6 |
|--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                          | making a comeback                                                                                                                                                                                                     |
| But wer                  | t away for a while, why?                                                                                                                                                                                              |







| Parameter                                    | 7                            |                      |                |
|----------------------------------------------|------------------------------|----------------------|----------------|
| <ul> <li>Names:</li> <li>Location</li> </ul> | r1,r2,r3<br>s: 11,12,13,14,1 | 5 16 17              |                |
|                                              |                              | r2→12, r3→13, 14     | –17 are "free" |
| MapTable                                     | FreeList                     | Raw insns            | Renamed insn   |
| 11 12 13                                     | 14,15,16,17                  | add r2,r3,r1         | add 12,13,14   |
| 14 12 13                                     | 15,16,17                     | sub r2,r1,r3         | sub 12,14,15   |
| 14 12 15                                     | 16,17                        | mul r2,r3,r1         | mul 12,15,16   |
| 16 12 15                                     | 17                           | div r1,r3,r2         | div 14,15,17   |
| Question                                     | how is the incr              | after div rename     | 42             |
| <b>Y</b>                                     |                              |                      | 1:             |
| <ul> <li>We are d</li> </ul>                 | out of tree locations        | (physical registers) |                |

| • P6                                                                                                                                 |                     |
|--------------------------------------------------------------------------------------------------------------------------------------|---------------------|
| No need to free storage for speculative ("in-flight"                                                                                 | ) values explicitly |
| <ul> <li>Temporary storage comes with ROB entry</li> <li>R: copy speculative value from ROB to register file</li> </ul>              | e, free ROB entry   |
| • R10K                                                                                                                               |                     |
| Can't free physical register when insn retires                                                                                       |                     |
| <ul> <li>No architectural register to copy value to</li> </ul>                                                                       |                     |
| • But                                                                                                                                |                     |
| <ul> <li>Can free physical register previously mapped to sa</li> <li>Why? All insns that will ever read its value have re</li> </ul> | 5                   |
|                                                                                                                                      |                     |
| CS/ECE 752 (Wood): Dynamic Scheduling II                                                                                             | 44                  |

| MapTable                    | FreeList                  | Raw insns    | Renamed insns              |
|-----------------------------|---------------------------|--------------|----------------------------|
| r1 r2 r3                    |                           |              |                            |
| 11 12 13                    | 14,15,16,17               | add r2,r3,r1 | add 12,13, <mark>14</mark> |
| 14 12 13                    | 15,16,17                  | sub r2,r1,r3 | sub 12,14,15               |
| 14 12 15                    | 16,17                     | mul r2,r3,r1 | mul 12,15,16               |
| 16 12 15                    | 17                        | div r1,r3,r2 | div 14,15,17               |
| When ad                     | ld retires, free 11       |              |                            |
| • When su                   | <b>b</b> retires, free 13 |              |                            |
| <ul> <li>When mu</li> </ul> | 1 retires, free ?         |              |                            |
| <ul> <li>When di</li> </ul> | v retires, free ?         |              |                            |
| See the p                   | pattern?                  |              |                            |
|                             |                           |              |                            |
|                             |                           |              |                            |

| • New tags (again)<br>• P6: ROB# $\rightarrow$ R10K: PR#                                                  |                   |
|-----------------------------------------------------------------------------------------------------------|-------------------|
| • ROB                                                                                                     |                   |
| <ul> <li>R: logical output register</li> <li>Told: physical register previously mapped to insn</li> </ul> | 's logical output |
| • RS                                                                                                      |                   |
| <ul> <li>T, T1, T2: output, input physical registers</li> </ul>                                           |                   |
| Map Table                                                                                                 |                   |
| <ul> <li>T+: PR# (never empty) + "ready" bit</li> </ul>                                                   |                   |
| Free List                                                                                                 |                   |
| • T: PR#                                                                                                  |                   |
| <ul> <li>No values in ROB, RS, or on CDB</li> </ul>                                                       |                   |
| <ul> <li>Yeager paper uses different names, what an</li> </ul>                                            | e they?           |
| CS/ECE 752 (Wood): Dynamic Scheduling II                                                                  | 46                |

г

| RC | B   |     |         |        |   |      |    |   |       | Map       | Table    | CDB            |
|----|-----|-----|---------|--------|---|------|----|---|-------|-----------|----------|----------------|
| ht | #   | Ins | sn      |        | R | Told | S  | X | C     | Reg       | T+       | Т              |
|    | 1   | ld  | f X(r   | 1),f1  |   |      |    |   |       | f0        | PR#1+    |                |
|    | 2   | mu  | 1f f0   | ,f1,f  | 2 | -    |    |   |       | f1        | PR#2+    |                |
|    | 3   | st  | f f2,   | Z(r1)  |   |      |    |   |       | f2        | PR#3+    |                |
|    |     | ad  | di r1   | ,4,r1  |   |      | _  |   |       | <b>r1</b> | PR#4+    |                |
|    | 5   | ld  | f X(r   | 1),f1  |   |      |    |   |       |           |          | -              |
|    |     |     | 1f f0   |        | 2 |      | -  |   |       |           | e List   |                |
|    | 7   | st  | f f2,   | Z(r1)  |   |      |    |   |       |           | 5, PR#6, |                |
|    |     |     |         |        |   |      |    |   |       | PR#       | 7,PR#8   | J.,            |
| Re | ser | vat | ion Sta | ations |   |      |    |   |       |           |          |                |
| #  | F   | U   | busy    | ор     | Т | T1   | T2 |   | Notic | e I: no   | values a | anywhere       |
| 1  | A   | LU  | no      |        |   |      | -  |   |       |           |          |                |
| 2  | L   | D   | no      |        |   |      |    |   |       |           |          |                |
| 3  | S   | т   | no      |        |   |      |    |   | NOTIC | e II: M   | apiable  | is never empty |
| 4  | F   | P1  | no      |        |   |      |    |   |       |           |          |                |
| 5  | F   | P2  | no      |        |   |      |    |   |       |           |          |                |

| R10K Pipeline                                                           |               |
|-------------------------------------------------------------------------|---------------|
| R10K pipeline structure: F, D, S, X, C, R                               |               |
| • D (dispatch)                                                          |               |
| <ul> <li>Structural hazard (RS, ROB, LSQ, physical regis</li> </ul>     | ters) ? stall |
| <ul> <li>Allocate RS, ROB, LSQ entries and new physical r</li> </ul>    | egister (T)   |
| Record previously mapped physical register                              | (Told)        |
| Update map table                                                        |               |
| C (complete)                                                            |               |
| <ul> <li>Write destination physical register, set Ready in N</li> </ul> | 1T            |
| R (retire)                                                              |               |
| <ul> <li>ROB head not complete ? Stall</li> </ul>                       |               |
| Handle any exceptions                                                   |               |
| <ul> <li>Store write LSQ head to D\$</li> </ul>                         |               |
| Free ROB, LSQ entries                                                   |               |
| Free previous physical register (Told)                                  |               |
| S/ECE 752 (Wood): Dynamic Scheduling II                                 | 48            |











| RC | )B     |        |        |      |          |     |     |      | Мар        | Table     | CDB          |
|----|--------|--------|--------|------|----------|-----|-----|------|------------|-----------|--------------|
| ht | # In   | sn     |        | R    | Told     | S   | Х   | C    | Reg        |           | T            |
| h  | 1 1d   | lf X(r | 1),f1  | f1   | PR#2     | c2  | c3  |      | f0         | PR#1+     |              |
|    | 2 mu   | lf f0  | ,f1,f  | 2 f2 | PR#3     |     |     |      | f1         | PR#5      |              |
| t  | 3 st   | f f2,  | Z(r1)  |      |          |     |     | -    | f2         | PR#6      |              |
| _  | 4 ad   | ldi r1 | ,4,r1  |      |          |     |     |      | <b>r</b> 1 | PR#4+     |              |
|    | 5 1d   | lf X(r | 1),f1  |      |          |     |     |      | _          |           |              |
|    | 6 mu   | lf f0  | ,f1,f  | 2    |          |     |     |      | Free       | List      |              |
|    | 7 st   | f f2,  | Z(r1)  |      |          |     |     |      | PR#'       | 7, PR#8   |              |
|    |        |        |        |      |          |     |     |      |            |           |              |
| Re | servat | ion St | ations |      | <u> </u> |     | 1   |      |            |           |              |
| #  | FU     | busy   | OD     | Т    | T1       | T2  |     | Sto  | res ar     | e not all | ocated pregs |
| 1  | ALU    | no     |        |      |          |     | {   |      |            |           |              |
| 2  | LD     | no     |        |      |          |     |     | Free |            |           |              |
| 3  | ST     | yes    | stf    |      | PR#6     | PR# | 4+  |      |            |           |              |
| 4  | FP1    | yes    | mulf   | PR#6 | PR#1+    | PR# | 5   |      |            |           |              |
| 5  | FP2    | no     | 1 8    |      | 1 3      |     | - 3 |      |            |           |              |

| RC |     |     |        |        |           |       |      |    |       | Map Table CDB              |
|----|-----|-----|--------|--------|-----------|-------|------|----|-------|----------------------------|
| ht | #   | Ins | sn     |        | R         | Told  | S    | Х  | C     | Reg T+ T                   |
| h  |     | 1d  | f X(r  | 1),f1  | f1        | PR#2  | c2   | c3 | c4    | f0 PR#1+ PR#5              |
|    | 2   |     | 1f f0  | ,f1,f  | 2 f2      | PR#3  | c4   |    |       | fl PR#5+                   |
|    | 3   | st  | f f2,  | Z(r1)  |           |       |      |    |       | f2 PR#6                    |
| t  | 4   | ad  | di r1  | ,4,r1  | <b>r1</b> | PR#4  |      |    |       | r1 PR#7                    |
|    |     |     |        | 1),f1  |           |       |      | -  |       | ╎┓ <u>┥┥</u>               |
|    |     | -   |        | ,f1,f  | 2         |       |      | -{ |       | Free List                  |
|    | 7   | st  | f f2,  | Z(r1)  |           |       |      | -  |       | <u>PR#7</u> , PR#8         |
|    |     |     |        |        |           |       |      |    |       |                            |
| Re | ser | vat | ion St | ations |           |       |      |    |       | <u>↓</u>                   |
| #  | F   |     | busy   |        | т         | T1    | T2   |    |       | Idf completes              |
| 1  | -   | LU  |        | addi   | PR#7      | PR#4+ |      |    |       | set MapTable ready bit     |
| 2  | L   |     | no     |        |           |       |      |    |       |                            |
| 3  | S   | т   | yes    | stf    |           | PR#6  | PR#4 | 1+ |       |                            |
| 4  | F   | P1  | yes    | mulf   | PR#6      | PR#1+ | PR#5 | 5+ | Match | h PR#5 tag from CDB & issu |
| 5  | F   | P2  | no     |        |           |       |      | 1  |       |                            |



| <ul> <li>Problem with R10K design? Precise state is         <ul> <li>Physical registers are written out-of-order (at C</li> </ul> </li> </ul> |                      |
|-----------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| <ul> <li>That's OK, there is no architectural register file</li> </ul>                                                                        | ,                    |
| <ul> <li>We can "free" written registers and "restore" of</li> </ul>                                                                          | d ones               |
| Do this by manipulating the Map Table and Free                                                                                                |                      |
| • Two ways of restoring Map Table and Free                                                                                                    | e List               |
| Option I: serial rollback using R, T <sub>old</sub> ROB fields                                                                                |                      |
| ± Slow, but simple                                                                                                                            |                      |
| <ul> <li>Option II: single-cycle restoration from some ch</li> </ul>                                                                          | neckpoint            |
| ± Fast, but checkpoints are expensive                                                                                                         |                      |
| <ul> <li>Modern processor compromise: make commo</li> </ul>                                                                                   | n case fast          |
| Checkpoint only (low-confidence) branches                                                                                                     | (frequent rollbacks) |
| <ul> <li>Serial recovery for page-faults and interrupt</li> </ul>                                                                             | s (rare rollbacks)   |

| R | OB  |       |         |        |      | 2 3   |    |           | 1  | Map    | Table           | CDB |
|---|-----|-------|---------|--------|------|-------|----|-----------|----|--------|-----------------|-----|
| h | : # | Ins   | sn      |        | R    | Told  | S  | Х         | С  | Reg    | T+              | Т   |
|   | 1   | ld    | f X(r   | 1),f1  | f1   | PR#2  | c2 | c3        | c4 | f0     | PR#1+           |     |
| h | 2   | mu    | 1f f0   | ,f1,f  | 2 f2 | PR#3  | c4 | c5        |    | f1     | PR#8            |     |
|   | 3   | st    | f f2,   | Z(r1)  |      |       |    |           |    | f2     | PR#6            |     |
|   |     |       |         | ,4,r1  |      | PR#4  | c5 |           |    | r1     | PR#7            |     |
| t |     |       |         | 1),f1  |      | PR#5  |    |           |    | -      |                 |     |
| _ |     |       |         | ,f1,f  | 2    |       |    |           |    |        | e List          |     |
|   | 7   | st    | f f2,   | Z(r1)  |      |       | -  |           |    | PR#    | <u>8</u> , pr#2 |     |
|   |     |       |         |        |      |       |    |           |    |        | 9               | J   |
| R | ese | rvati | ion Sta | ations |      |       |    |           |    |        |                 |     |
| # | F   | Ū     | busy    | op     | Т    | T1    | T2 |           |    | undo i | nsns 3-5        |     |
| 1 | 7   | LU    | yes     | addi   | PR#7 | PR#4+ | -  | -         |    |        | i't matte       |     |
| 2 | 1   | D     | yes     | ldf    | PR#8 |       | PR | <b>‡7</b> |    |        | rial rollb      |     |
| 3 | 5   | ST    | yes     | stf    |      | PR#6  | PR | #4+       |    | use se | indi romo       | uck |
| 4 | I   | 'P1   | no      |        |      |       |    |           |    |        |                 |     |
| 5 | I   | 'P2   | no      |        |      |       |    |           |    |        |                 |     |

|                  |                      |                            | <u> </u>   |           | 1 4         |    |    | 1 1                              |                             |                         |                  | _            |      |      |       |    |
|------------------|----------------------|----------------------------|------------|-----------|-------------|----|----|----------------------------------|-----------------------------|-------------------------|------------------|--------------|------|------|-------|----|
| RC               |                      |                            |            |           |             |    |    | -                                |                             |                         | Table            | 3            |      | C    | DB    |    |
| ht               | # 1                  | Insn                       |            | R         | Told        | S  | Х  | C                                | F                           | eg                      | T+               |              |      | Т    |       |    |
|                  | 1                    | ldf X(r                    | 1),f1      | f1        | PR#2        | c2 | c3 | c4                               | f                           | 0                       | PR#1             | +            |      |      | ~     |    |
| h                | 2 1                  | mulf f0                    | ,f1,f      | 2 f2      | PR#3        | c4 | c5 |                                  | - f                         | 1                       | PR#5             | + <u>P</u> F | 8#8  |      |       |    |
|                  | 3                    | stf f2,                    | Z(r1)      |           |             |    | /  |                                  | 1                           | 2                       | PR#6             |              |      |      |       |    |
| t                | 4                    | addi r1                    | ,4,r1      | r1        | PB#4        | c5 | /  |                                  | r                           | 1                       | PR#7             | _            | -    |      |       |    |
|                  | 5                    | ldf X(r                    | 1),f1      | f1        | PR#5        | _  |    |                                  | -                           |                         |                  |              |      |      |       |    |
|                  |                      | mulf f0                    | ,f1,f      | 2         |             |    |    |                                  |                             |                         | List             |              |      |      |       |    |
|                  | 7                    | stf f2,                    | Z(r1)      |           |             |    |    |                                  | F                           | R#:                     | 2, <b>PR#</b>    | 8            |      |      |       |    |
|                  |                      |                            |            |           |             |    |    |                                  |                             |                         |                  |              |      |      |       |    |
|                  |                      |                            |            |           |             |    |    |                                  |                             |                         |                  |              |      |      |       |    |
| Re               | serv                 | ation St                   | ations     |           |             |    | -  | und                              | L<br>bld                    | f (R                    | OB#5             | )            |      |      |       |    |
|                  |                      | vation St                  |            | т         | <u>Тт</u> 1 | T2 |    | und<br>1. fr                     |                             |                         | OB#5             | )            |      |      |       |    |
| #                | FU                   | busy                       | ор         | T<br>DD#7 | T1          | T2 |    | 1. fr                            | ee F                        | Ś                       | OB#5<br>R#8),    |              | urn  | to F | reeLi | st |
| #<br>1           | FU                   | busy<br>vyes               | ор         | T<br>pr#7 | T1<br>PR#4+ |    |    | 1. fr<br>2. fr                   | ee F<br>ee T                | lS<br>(P                |                  | reti         |      |      |       | st |
| #<br>1<br>2      | FU<br>AL             | U yes                      | Op<br>addi |           | PR#4+       |    |    | 1. fr<br>2. fr                   | ee F<br>ee 1<br>sto         | IS<br>(P<br>re I        | R#8),<br>//T[f1] | reti         |      |      |       | st |
| #<br>1<br>2<br>3 | FU<br>AL<br>LD<br>ST | busy<br>Uyes<br>no<br>yes  | ор         |           |             |    | 4+ | 1. fr<br>2. fr<br>3. re          | ee F<br>ee 1<br>sto         | IS<br>(P<br>re I        | R#8),<br>//T[f1] | reti         |      |      |       | st |
| #<br>1<br>2      | FU<br>AL             | U yes<br>no<br>yes<br>1 no | Op<br>addi |           | PR#4+       |    | 4+ | 1. fr<br>2. fr<br>3. re<br>4. fr | ee F<br>ee 1<br>sto<br>ee F | IS<br>(P<br>re I<br>IOE | R#8),<br>//T[f1] | retu<br>to   | Tole | d (P | R#5)  |    |

| RC | )B    |         |        |   |       |    |        | 3      | Г  | Man       | Table        |      | CDB   |
|----|-------|---------|--------|---|-------|----|--------|--------|----|-----------|--------------|------|-------|
| -  | #  In | cn      |        | R | Told  | S  | X      | С      |    | Reg       | T+           | -    | т     |
| ne |       | lf X(r  | 1) 61  |   |       | _  | <br>c3 | -      |    | f0        | PR#1+        |      | -     |
| h  |       | lf f0   |        |   |       |    | c5     | 64     |    | <u>f1</u> | PR#5+        | -    |       |
| t  |       | f f2,   |        |   | 1.1.1 | 64 | 25     |        |    | f2        | PR#6         | -    |       |
| -  |       | ldi r1  |        |   | PR#4  | c5 |        |        |    | r1        | PR#4+PR#     | 7    |       |
|    |       | lf X(r  |        |   |       |    |        |        | 1  |           | 5            | -    |       |
|    |       | lf f0   |        |   |       |    | 1      |        |    | Free      | List         |      |       |
|    |       | f f2,   |        |   |       |    |        |        |    |           | 2, PR#8,     |      |       |
|    | 1. 1. | ,       | /      |   |       |    |        |        |    | PR#       |              |      |       |
| Re | serva | tion St | ations |   |       | 1  | -      | und    | 0  | addi      | (ROB#4)      |      |       |
| #  | FU    | busy    |        | Т | T1    | T2 |        | 1. fre | -  |           |              |      |       |
| 1  | ALU   | no      |        | - | 1     |    |        |        |    |           | R#7), returr |      |       |
| 2  | LD    | no      |        |   |       |    |        |        |    |           | /T[r1] to To | ld ( | PR#4) |
| 3  | ST    | ves     | stf    |   | PR#6  | PR | #4+    | 4. fre | ee | ROE       | 3#4          |      |       |
| 4  | FP1   | no      | 1 m    |   | 1 100 |    |        |        |    |           |              |      |       |
| 5  | FP2   | no      | 1      |   |       |    |        |        |    |           |              |      |       |

| -      |       |         |        |   |        |    |    |          |     |      | <b>T</b> 1 1 |            |
|--------|-------|---------|--------|---|--------|----|----|----------|-----|------|--------------|------------|
| RC     |       |         |        |   |        | -  |    |          |     |      | Table        | CDB        |
| ht     | # In  | -       |        | R | Told   | S  | X  | C        |     |      | T+           | Т          |
|        |       | lf X(r  |        |   |        |    | c3 | c4       |     | 0    | PR#1+        |            |
| ht     |       | lf f0   |        |   | 2 PR#3 | c4 | c5 |          | -   | 1    | PR#5+        |            |
|        |       | f f2,   | Z(r1)  | ) |        |    |    |          |     | 2    | PR#6         |            |
|        |       | ldi r1  | ,4,r   | 1 |        |    |    |          | r   | 1    | PR#4+        |            |
|        |       | lf X(r  |        |   |        |    |    |          |     |      |              |            |
|        |       | lf f0   |        |   | 1.1.1. |    |    | <u>1</u> |     |      | e List       |            |
|        | 7 st  | f f2,   | Z (r1) |   |        |    |    |          |     |      | 2, PR#8,     |            |
|        |       |         |        |   |        |    |    |          | P   | R#   | 7            |            |
| Re     | serva | tion St | ations |   |        | -  | -  | undo     | st  | f (F | ROB#3)       |            |
| #      | FU    | busy    |        | T | T1     | T2 | -  | 1. fre   |     |      |              |            |
| π<br>1 | ALU   | no      |        |   | 11     | 12 | _  | 2. fre   | e F | 201  | 3#3          |            |
| 2      | LD    | no      | -      | - | _      | -  | _  | 3. no    | re  | gis  | ters to re   | store/free |
| 2      | ST    | no      |        |   | -      | -  | -  | 4. hc    | w i | s C  | )\$ write u  | indone?    |
| 3<br>4 | FP1   | no      |        |   |        |    | -  |          |     |      |              |            |
| 4<br>5 |       |         |        | - |        | +  | -  |          |     |      |              |            |
| 5      | FP2   | no      |        |   |        | _  |    |          |     |      |              |            |

| • | Faster precise state                                   |                       |
|---|--------------------------------------------------------|-----------------------|
|   | <ul> <li>Use for (low-confidence) branches</li> </ul>  |                       |
| • | Record state prior to predicted branch                 |                       |
|   | <ul> <li>Save copy of MapTable</li> </ul>              |                       |
|   | Save copy of ROB tail pointer                          | Why not both head     |
|   | <ul> <li>Save copy of FreeList head pointer</li> </ul> | and tail pointers?    |
| • | Mark RS entries as conditional (one bit                | t per branch)         |
|   |                                                        |                       |
| • | On mispredicted branch                                 |                       |
|   | <ul> <li>Restore checkpointed state</li> </ul>         |                       |
|   | FreeList retains                                       |                       |
|   | Clear RS entries that are conditional or               | n mispredicted branch |
|   | What about instructions that have                      | already completed?    |
| • | R10K implements 4 checkpoints                          |                       |
|   | Relationship to Smith and Pleszkun?                    |                       |

| Feature                                              | P6                                                                                   | R10K                                 |
|------------------------------------------------------|--------------------------------------------------------------------------------------|--------------------------------------|
| Value storage                                        | ARF,ROB,RS                                                                           | PRF                                  |
| Register read                                        | @D: ARF/ROB → RS                                                                     | @S: PRF → FU                         |
| Register write                                       | @R: ROB → ARF                                                                        | @C: FU → PRF                         |
| Speculative value                                    | free @R: automatic (ROB)                                                             | @R: overwriting insn                 |
| Data paths                                           | $ARF/ROB \rightarrow RS$                                                             | $PRF \rightarrow FU$                 |
|                                                      | RS → FU                                                                              | $FU \rightarrow PRF$                 |
|                                                      | $FU \rightarrow ROB$                                                                 |                                      |
|                                                      | $ROB \rightarrow ARF$                                                                |                                      |
| Precise state                                        | Simple: clear everything                                                             | Complex: serial/checkpoir            |
| <ul> <li>R10K-style b</li> <li>E.g., MIPS</li> </ul> | Simple: clear everything<br>became popular in late 90<br>R10K (duh), DEC Alpha 21264 | l's, early 00's<br>I, Intel Pentium4 |
| <ul> <li>P6-style is p</li> </ul>                    | erhaps making a comeba                                                               | ck                                   |
|                                                      |                                                                                      | t, simplicity is important           |

| All insns are easy in out-of-order                                    |                      |
|-----------------------------------------------------------------------|----------------------|
| Register inputs only                                                  |                      |
| Register renaming captures all dependences                            |                      |
| <ul> <li>Tags tell you exactly when you can execute</li> </ul>        |                      |
| except loads                                                          |                      |
| <ul> <li>Register and memory inputs (older stores)</li> </ul>         |                      |
| <ul> <li>Register renaming does not tell you all dependent</li> </ul> | dences               |
| <ul> <li>Memory renaming (a little later)</li> </ul>                  |                      |
| <ul> <li>How do loads find older in-flight stores to san</li> </ul>   | ne address (if any)? |
|                                                                       |                      |
|                                                                       |                      |
|                                                                       |                      |
|                                                                       |                      |
| S/ECE 752 (Wood): Dynamic Scheduling II                               | 64                   |





| Stores                                                        |         |
|---------------------------------------------------------------|---------|
| Dispatch (D)                                                  |         |
| Allocate entry at SQ tail                                     |         |
| Execute (X)                                                   |         |
| Write address and data into corresponding SQ                  | slot    |
| Retire (R)                                                    |         |
| Write address/data from SQ head to D\$, free                  | SQ head |
| Loads                                                         |         |
| Dispatch (D)                                                  |         |
| <ul> <li>Record current SQ tail as "load position"</li> </ul> |         |
| Execute (X)                                                   |         |
| Where the good stuff happens                                  |         |
| Retire (R)                                                    |         |
| Check for (ordering) exceptions                               |         |
| CS/ECE 752 (Wood): Dynamic Scheduling II                      | 67      |



| <ul> <li>Why "" in "out-of-order"?</li> </ul>                         |                          |
|-----------------------------------------------------------------------|--------------------------|
| + Load can execute out-of-order with respect                          | to (wrt) other loads     |
| <ul> <li>Need to check for multiprocessor ordering</li> </ul>         | ng violations (CS757)    |
| <ul> <li>+ Stores can eXecute out-of-order wrt other s</li> </ul>     | tores                    |
| + Can't let other cores see OoO stores in a                           | a multicore              |
| + Must Retire in order                                                |                          |
| <ul> <li>Loads must execute in-order wrt older<br/>address</li> </ul> | stores to same           |
| <ul> <li>Load execution requires knowledge of a</li> </ul>            | ll older store addresses |
| <ul> <li>Stall if store address not yet known</li> </ul>              |                          |
| + Simple                                                              |                          |
| <ul> <li>Restricts performance</li> </ul>                             |                          |
| <ul> <li>Used in P6 and EV-6</li> </ul>                               |                          |







| Dispatch (D)     Allocate entry at LQ tail     Execute (X)     |  |
|----------------------------------------------------------------|--|
| , -                                                            |  |
| Execute (X)                                                    |  |
|                                                                |  |
| <ul> <li>Write address into corresponding LQ slot</li> </ul>   |  |
| Stores                                                         |  |
| Dispatch (D)                                                   |  |
| <ul> <li>Record current LQ tail as "store position"</li> </ul> |  |
| Execute (X)                                                    |  |
| <ul> <li>Where the good stuff happens</li> </ul>               |  |
|                                                                |  |
|                                                                |  |
|                                                                |  |
|                                                                |  |



| <ul> <li>Opportunistic scheduling better than conservat<br/>+ Avoids many unnecessary delays</li> </ul>                                  | ive           |
|------------------------------------------------------------------------------------------------------------------------------------------|---------------|
| + 100-300 false dep/1K Instrs                                                                                                            |               |
| but can degrade performance                                                                                                              |               |
| <ul> <li>Introduces few flushes, but each is much costlier that</li> <li>0-25 misspeculations/1K Instrs * 12-35 cycles (Alpha</li> </ul> |               |
| Observe: loads/stores that cause violations are                                                                                          | "stable"      |
| <ul> <li>Dependences are mostly program based, program d</li> <li>Scheduler is deterministic</li> </ul>                                  | oesn't change |
| Exploit: intelligent load scheduling                                                                                                     |               |
| <ul> <li>Hybridize conservative and opportunistic</li> </ul>                                                                             |               |
| <ul> <li>Predict which loads, or load/store pairs will cause viol</li> </ul>                                                             | olations      |
| Use conservative scheduling for those, opportunistic                                                                                     | for the rest  |
| CS/ECE 752 (Wood): Dynamic Scheduling II                                                                                                 | 75            |

| Change belles de sues d'arbiers                                                                                                                         |                |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
| Store-blind prediction     Predict lead only write for all older stores to even to                                                                      |                |
| <ul> <li>Predict load only, wait for all older stores to execute</li> <li>± Simple, but a little too heavy handed</li> </ul>                            |                |
| Example: Alpha 21264                                                                                                                                    |                |
| <ul> <li>Store-load pair prediction</li> <li>Predict load/store pair, wait for only one store to exe<br/>± More complex, but minimizes delay</li> </ul> | cute           |
| Store set prediction                                                                                                                                    |                |
| Group loads and stores into dependent sets                                                                                                              |                |
| <ul> <li>Store-Set Table: load-PC → store-PC</li> </ul>                                                                                                 |                |
| • Last Store Table: store-PC $\rightarrow$ SQ index of most re                                                                                          | ecent instance |
| CS/ECE 752 (Wood): Dynamic Scheduling II                                                                                                                | 76             |

| Moshovos, et al.                                                    |    |
|---------------------------------------------------------------------|----|
| Memory Dependence Prediction Table (MDPT)                           |    |
| Identifies static load-store dependence                             |    |
| <ul> <li>LDPC, STPC, dependence DISTance, prediction</li> </ul>     |    |
| DIST identifies dynamic instance of dependent stor                  | re |
| Memory Dependence Synchronization Table (MDST)                      |    |
| <ul> <li>Used to synchronize dynamic instance in MDPT</li> </ul>    |    |
| Coordinate with instruction scheduler                               |    |
| <ul> <li>For (i=0; i<n-2; i++)="" li="" {<=""> </n-2;></li></ul>    |    |
| sum += X[i];                                                        |    |
| if (X[i] % 7 ==1) X[i+2] = X[i+2]/2; }                              |    |
| Store sets will stall on each instance of load                      |    |
| <ul> <li>Implemented in Intel Nehalem/Haswell. Apple A7?</li> </ul> |    |
| See WARF v. Intel, WARF v. Apple                                    |    |
| ECE 752 (Wood): Dynamic Scheduling II                               | 77 |
| ECE 752 (wood): Dynamic Scheduling II                               | 1  |



| <ul> <li>Does frequency vs. width tradeoff actua</li> </ul>                 | lly work?              |
|-----------------------------------------------------------------------------|------------------------|
| Yes in some places, no in others                                            |                        |
| + Yes: fetch, decode, rename, retire (all the ir                            | n-order stages)        |
| <ul> <li>No: issue, execute, complete (all the out-of-</li> </ul>           | order stages)          |
| What's the difference?                                                      |                        |
| Out-of-order: parallelism doesn't help if in                                | nsns themselves serial |
| <ul> <li>2 dependent insns execute in 2 cycle</li> </ul>                    | s, regardless of width |
| <ul> <li>In-order: inter-insn parallelism doesn't m</li> </ul>              | atter                  |
| Intel Pentium4: multiple clock domai                                        | ns                     |
| <ul> <li>In-order stages run at 3.4 GHz, out-of-order</li> </ul>            | stages at 6.8 GHz!     |
| • Frequency $\propto$ Power <sub>dynamic</sub> $\rightarrow$ high frequency | only where necessary   |
|                                                                             |                        |
| CS/ECE 752 (Wood): Dynamic Scheduling II                                    | 79                     |







| Feature             | Pentium III         | Pentium 4                 |  |
|---------------------|---------------------|---------------------------|--|
| Peak clock          | 800 MHz             | 3.4 GHz (6.8 internal)    |  |
| Pipeline stages     | 15                  | 22                        |  |
| Branch prediction   | 512 local + 512 BTB | 2K hybrid + 2K BTB        |  |
| Primary caches      | 16KB 4-way          | 8KB 4-way + 64KB T\$      |  |
| L2                  | 512KB-2MB           | 256KB-2MB                 |  |
| Fetch width         | 16 bytes            | 3 µops (16 bytes on miss) |  |
| Rename/retire width | 3 µops              | 3 μops                    |  |
| Execute width       | 5 μops              | 7 μops (X2)               |  |
| Register renaming   | P6                  | R10K                      |  |
| ROB/RS size         | 40/20               | 128/60                    |  |
| Load scheduling     | Conservative        | Intelligent               |  |
| Anything else?      | No                  | Hyperthreading            |  |

| <ul> <li>New SRAMs consume a lot of power</li> </ul>                                                        |     |
|-------------------------------------------------------------------------------------------------------------|-----|
| <ul> <li>Re-order buffer, reservation stations, physical register</li> </ul>                                | ile |
| <ul> <li>New CAMs consume even more (relatively)</li> </ul>                                                 |     |
| <ul> <li>Reservation stations, load/store queue</li> </ul>                                                  |     |
| Is dynamic scheduling low-energy?<br>± Could be                                                             |     |
|                                                                                                             |     |
| <ul><li>Does performance improvement offset power increase?</li><li>Are there "deep sleep" modes?</li></ul> |     |

| <ul> <li>What is</li> </ul>     | dynamic scheduling affect reliability?<br>the fault model?                                 |  |
|---------------------------------|--------------------------------------------------------------------------------------------|--|
|                                 | t faults (α-particles)? More transistors, more faults?<br>faults (electro-migration)? Same |  |
| – Permane                       | ent faults (design errors)? Worse, ooo is complicated                                      |  |
| A holistic                      | view of electrical reliability                                                             |  |
| <ul> <li>Vulneration</li> </ul> | pility to electrical faults is function of transistor size                                 |  |
| <ul> <li>Mitigate</li> </ul>    | (even eliminate) with larger transistors                                                   |  |
| <ul> <li>But large</li> </ul>   | er transistors consume more power and energy                                               |  |
| • Unles                         | ss we slow them down                                                                       |  |
|                                 |                                                                                            |  |



| <ul> <li>Why I</li> </ul> | DIVA works                                                                 |
|---------------------------|----------------------------------------------------------------------------|
|                           | execution acts like an in-order stage for parallelization purposes         |
| <ul> <li>Car</li> </ul>   | n re-execute dependent insns in parallel!                                  |
| <ul> <li>Hor</li> </ul>   | w come? "dependence-free checking"                                         |
| •                         | You have original inputs and outputs of all insns                          |
| •                         | Try working this out for yourself                                          |
| <ul> <li>What</li> </ul>  | DIVA accomplishes                                                          |
| + Det                     | tects transient errors in out-of-order stages                              |
| •                         | Re-execution is parallel $\rightarrow$ slow clock, big, robust transistors |
| + Car                     | n also detect design errors                                                |
| •                         | Re-execution (in-order) simpler than execution (out-of-order)              |
| •                         | Less likely to contain rare bugs                                           |

|                             | I path modeling"<br>ify (and optimize) performance critical instructions                   |
|-----------------------------|--------------------------------------------------------------------------------------------|
|                             | le schedulers"                                                                             |
| <ul> <li>Supp</li> </ul>    | ort for huge schedulers, several different designs                                         |
| <ul> <li>"Macro</li> </ul>  | ops and dataflow mini-graphs"                                                              |
|                             | dule groups of dependent insns at once (MG: also fetch, retire<br>ore with fewer resources |
| <ul> <li>"Out-o"</li> </ul> | -order fetch and rename"                                                                   |
| <ul> <li>Avoi</li> </ul>    | branch mispredictions by fetching control independent insns                                |
| Much r                      | 10re                                                                                       |
|                             |                                                                                            |

| •   | Modern dynamic scheduling must support precise state<br>• A software sanity issue, not a performance issue                                                                                                                                                                |
|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| •   | Strategy: Writeback $\rightarrow$ Complete (OoO) + Retire (iO)                                                                                                                                                                                                            |
| •   | Two basic designs<br>• P6: Tomasulo + re-order buffer, copy based register renaming<br>± Precise state is simple, but fast implementations are difficult<br>• R10K: implements true register renaming<br>± Easier fast implementations, but precise state is more complex |
| •   | <ul> <li>Store queue: conservative load scheduling (iO wrt older stores)</li> <li>Load queue: opportunistic load scheduling (OoO wrt older stores)</li> <li>Intelligent memory scheduling: hybrid</li> </ul>                                                              |
| CS/ | ECE 752 (Wood): Dynamic Scheduling II 89                                                                                                                                                                                                                                  |

| Out-of-order execution: a performance techr                                                                                  |                    |
|------------------------------------------------------------------------------------------------------------------------------|--------------------|
| <ul> <li>Easier/more effective in hardware than software (</li> <li>Idea: make scheduling transparent to software</li> </ul> | isn't everytning?) |
| Feature I: Dynamic scheduling (iO $\rightarrow$ OoO)                                                                         |                    |
| <ul> <li>"Performance" piece: re-arrange insns into high-p</li> <li>Decode (iO) → dispatch (iO) + issue (OoO)</li> </ul>     | erformance order   |
| Two algorithms: Scoreboard, Tomasulo                                                                                         |                    |
| Feature II: Precise state (OoO $\rightarrow$ iO)                                                                             |                    |
| "Correctness" piece: put insns back into program                                                                             | order              |
| <ul> <li>Writeback (OoO) → complete (OoO) + retire (iO)</li> <li>Two designs: P6, R10K</li> </ul>                            |                    |
| <ul> <li>Don't forget about memory scheduling</li> </ul>                                                                     |                    |