# Main: Homework 5

Homework 5

Due 04/13
Weight: 15%

Problem 1 must be done with your project partner.

Submitted:

• Problems 1 & 2
• Electronic submission of files: Submit to learn@UW, One submission per pair, titled hw5.tar.

Not Submitted:

• Problems 3 - 10:
• These problems are optional and will not be graded but are recommended for a better understanding of the course material.

Overview

• Problem 1 & 2 involves writing assembly programs to demonstrate functionality of a pipelined processor

### Problem 1

a) Write an assembly program to demonstrate forwarding in a pipelined processor implementation. Write your code in `p1.asm`.

b) Also write an explanation of your program including where and why forwarding takes place. Write your answer in `p1.txt`.

### Problem 2

a) Write an assembly program to demonstrate why branch prediction is necessary and useful. Write your code in `p2.asm`.

b) Write an explanation of your program and how branch prediction helps in `p2.txt`.

c) Will branch prediction always take only 1 cycle? Write your answer in `p2.txt`.

The remaining problems will not be graded but are recommended for better understanding of the course material.

### Problem 3

Given a 2K Bytes 2 way set associative cache with 16 byte lines and the following code:

for (int i =0; i < 1000; i++)

``` {
A[i] = 40 * B[i];
}
```

a) Compute the overall miss rate (assuming array entries require one word, and each word is 4 byte, and that the base address of each array is aligned with cache line boundary).

b) What kind of cache locality is being exploited?

### Problem 4

Consider a direct-mapped cache with 32-byte blocks and a total capacity of 512 bytes in a system with a 32-bit address space. Assume this is a byte addressable cache.

1. Indicate which bits of an address in this machine correspond to the tag, index, and offset, respectively.
2. For the sequence of addresses below, indicate which references will result in cache hits and which will result in cache misses. If it does result in a miss, mark whether the miss was a compulsory, capacity, or conflict miss. Assume the cache is initially empty. (All valid bits are set to 0)
3. Show the final contents of the address tags at the end of execution.
4. Explain what can be done to improve each type of miss.
```0x0000a796
0x000092e8
0x000092f4
0x00004182
0x0000780a
0x0000a690
0x0000408e
0x0000a798
0x00007800
0x000092fc
0x00027c02
0x0000408a
0x00004198
0x00006710
0x0000670c
0x00027c04
0x0000a790
```

### Problem 5

Re-do problem 3, but using a two-way set-associative cache. When replacing a block, the least-recently-used block is chosen to be replaced. Everything else (block size and total capacity) remains the same.

Determine the speedup over the direct-mapped cache in problem 3. Assume both caches can be accessed in 1 cycle, that the CPI without misses is 1.0, and that the miss penalty is 25 cycles.

### Problem 6

Consider a cache with the following characteristics:

• 32-byte blocks
• 5-way set associative
• 1024 sets
• writeback
• LRU replacement policy
1. How many bytes of data storage are there?
2. What is the total number of bits needed to implement the cache?
3. Make a picture similar to the one on page 486 of the text. (As with the picture in the text, include the hit and data logic.)

### Problem 7

How many storage bits are required to implement a 256KB cache, with 16B blocks, that is a 4 way set-associative, uses write-back policy, LRU replacement and assuming a 2^36 byte addressable address space ?

Bits are required for : 1. The Data 2. The Tags 3. The Valid bits 4. The dirty bits 5. The LRU bits

### Problem 8

Do problems 5.4.1 to 5.4.3 in page 551 of textbook.

### Problem 9

Do problems 5.7.1 to 5.7.3 in page 554 of textbook.

### Problem 10

Given processor running at 2GHz with a base CPI of 1.0 (CPI without considering memory access delay, stalls, etc). About 30% of the instructions in a program involve data memory access. The access delay of instruction memory is ignored. The data memory access time is 100 ns including miss handling. Its primary (L1) cache has a hit rate of 99% and no access penalty if it is a hit. Now, it is considered to add a L2 cache between the L1 cache and the main memory. Suppose the L2 cache has a miss ratio of 20% and access delay of 5 ns. How much performance improvement with the L2 cache than without it?