### GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Presented by Jason Power



### Data Parallel Execution on GPUs Data Parallelism, Programming Models, SIMT



























Terminology Headache #1

It's common to interchange 'SIMD' and 'SIMT'















### **GPU Programming** Models

OpenCL

### **GPU Programming Models**

 $\textbf{CUDA} - \underline{\textbf{C}} \textbf{ompute} \ \underline{\textbf{U}} \textbf{nified} \ \underline{\textbf{D}} \textbf{evice} \ \underline{\textbf{A}} \textbf{rchitecture}$ 

- Developed by Nvidia -- proprietary
- First serious GPGPU language/environment

### **OpenCL** – <u>**Open Computing Language**</u> • From makers of OpenGL

- Wide industry support: AMD, Apple, Qualcomm, Nvidia (begrudgingly), etc.

### $\textbf{C++ AMP-} \underline{\textbf{C++}} \ \underline{\textbf{A}} \text{ccelerated} \ \underline{\textbf{M}} \text{assive} \ \underline{\textbf{P}} \text{arallelism}$

- Microsoft
- Much higher abstraction that CUDA/OpenCL

### OpenACC – Open Accelerator

- Like OpenMP for GPUs (semi-auto-parallelize serial code)
- Much higher abstraction than CUDA/OpenCL

### **GPU Programming Models**

CUDA – Compute Unified Device Architecture

- Developed by Nvidia -- proprietary
- First serious GPGPU language/environment

### $\textbf{OpenCL} - \underline{\textbf{Open}} \ \underline{\textbf{C}} \textbf{omputing} \ \underline{\textbf{L}} \textbf{anguage}$

- From makers of OpenGL
- Wide industry support: AMD, Apple, Qualcomm, Nvidia (begrudgingly), etc.

### $C++AMP-\underline{C++}$ $\underline{A}$ ccelerated $\underline{M}$ assive $\underline{P}$ arallelism

- Microsoft
- Much higher abstraction that CUDA/OpenCL

### OpenACC - Open Accelerator

- Like OpenMP for GPUs (semi-auto-parallelize serial code)
- Much higher abstraction than CUDA/OpenCL

### OpenCL

Early CPU languages were light abstractions of physical hardware

Early GPU languages are light abstractions of physical hardware

### OpenCL

Early CPU languages were light abstractions of physical hardware

 $\label{eq:continuous} \textit{Early GPU languages are light abstractions of physical hardware}$ OpenCL + CUDA

### **GPU Architecture**



### OpenCL

Early CPU languages were light abstractions of physical hardware

Early GPU languages are light abstractions of physical hardware • OpenCL + CUDA

### **GPU Architecture**



### OpenCL Model





































A Rose by Any Other Name...









Advanced Topics

GPU Limitations, Future of GPGPU







### Branch Divergence When control flow diverges, all lanes take all paths Divergence Kills Performance

```
Divergence isn't just a performance problem:

__global int lock = 0;
void mutex_lock(...)

Deadlock: work-items can't enter mutex together!

// acquire lock
while (test&set(lock, 1) == false) {
    // spin
    }
    return;
}
```







# Memory Divergence One work-item stalls → entire wavefront must stall • Cause: Bank conflicts, cache misses Data layout & partitioning is important







# GPU Coherence? Notice: GPU consistency model does not require coherence • i.e., Single Writer, Multiple Reader Marketing claims they are coherent... GPU "Coherence": • Nvidia: disable private caches • AMD: flush/invalidate entire cache at fences



