# CS 758: Advanced Topics in Computer Architecture

Lecture #18: Accelerator Wall

Professor Matthew D. Sinclair

#### Announcements

- Midterm #2 on Tuesday at night
  - Exam review on Tuesday in class
- Project lightning talks due following week (12/3 at 9 AM)
- Project presentations 12/10

#### Moore's Law

 $10^{7}$ Transistors (thousands) 10<sup>6</sup> Single-Thread 10<sup>5</sup> Performance (SpecINT x 10<sup>3</sup>)  $10^{4}$ Frequency (MHz) 10<sup>3</sup> Typical Power 10<sup>2</sup> (Watts) Number of 10 Logical Cores 10<sup>0</sup> 1970 1980 1990 2000 2010 2020 Year

42 Years of Microprocessor Trend Data

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2017 by K. Rupp

#### Silver Bullet for Moore's Law?

#### Parallelism

| Memory Controller |         |          |        |
|-------------------|---------|----------|--------|
| Core Core Core    | Core    | Core     | Core   |
| O and QPI         | d'Uncor |          | IdCope |
| Shared L3 Cache   | Share   | ed L3 Ca | che    |

#### **Specialization**



### Fundamental tradeoff: Programmability vs. Efficiency



#### Issues

- Moore's Law is dead/dying
- Heterogeneous computing to the rescue?
- Just keep adding more ASICs?

## ASIC approach

- No more/little transistor scaling coming ...
- ... so use the transistors in 3 easy steps:
- 1. Identify a killer application (e.g., machine learning)
- 2. Tape out an ASIC specialized for that application
- 3. Profit!

How to determine gains from specialization?



Once Moore's Law ends, little to gain from CMOS Potential

### Gains from transistor scaling vs. specialization



- Trends:
  - Almost all of the true gains come from transistor scaling
  - Takes some iterating to optimize an architecture for CSR

### Results

#### • Common patterns

- Computational confinement: fixed and straightforward HW implementation
  - Issue: Amdahl's Law bottlenecks already sped up
  - Hard to improve further without drastic redesign (easy things done already)
- Massive parallelism:
  - Issue: eventually dark silicon will come for us all (or at least our chips)
  - Eventually won't be able to increase the number of compute units further per mm<sup>2</sup>
- Domain maturity
  - Issue: once accelerators converge to optimal design, what else is left?
  - All domains will quickly become mature?

Need to think bigger, bolder → break inherent assumptions Applications and software are not static