# MEMORY: PAGING AND TLBS Shivaram Venkataraman CS 537, Spring 2019 ### **ADMINISTRIVIA** - Project Ib is due Friday - Project Ia grades are out - Project 2a going out tomorrow - Discussion section: Process API, Project 2a ### AGENDA / LEARNING OUTCOMES Memory virtualization What is paging and how does it work? What are some of the challenges in implementing paging? # **RECAP** ### MEMORY VIRTUALIZATION Sharing: Enable sharing between cooperating processes \_\_\_ Protection: Cannot corrupt OS or other process memory ficiency: Do not waste memory or slow down pro- # **ABSTRACTION: ADDRESS SPACE** # SEGMENTATION IMPLEMENTATION ### QUIZ: ADDRESS TRANSLATIONS WITH SEGMENTATION 14 bit = 3 hex bits are (2 hit to 2 hits | Segment | Base | Bounds | R W | N | |---------|--------|--------|-----|---| | 0 | 0x2000 | 0x6ff | 1 0 | | | 1 | 0x0000 | 0x4ff | 1 1 | | | 2 | 0x3000 | 0xfff | 1 1 | | | 3 | 0x0000 | 0x000 | 0 0 | | Remember: I hex digit → 4 bits Translate logical (in hex) to physical 0x 3000 + 0x065c # REVIEW: MEMORY ACCESSES I. Fetch instruction at logical addr 0x0010 accepted Physical addr: 0x4010 3c, load from 0x1100, %edi 0x0010: mov1 0x0013: addl \$0x3, %edi %edi, 0x1100 0x0019: movl %rip: 0x0010 | Seg | Base | Bounds | |-----|--------|--------| | 0 | 0×4000 | 0xfff | | I | 0×5800 | 0xfff | | 2 | 0×6800 | 0×7ff | 2. Exec, load from logical addr 0x1100 Physical addr: 0x5900 3. Fetch instruction at logical addr 0x0013 Physical addr: 0x4013 - Exec, no load - Fetch instruction at logical addr 0x0019 Physical addr: 0x4019 6. Exec, store to logical addr 0x1100 Physical addr: 0x5900 ### ADVANTAGES OF SEGMENTATION Enables sparse allocation of address space Stack and heap can grow independently - Heap: If no data on free list, dynamic memory allocator requests more from OS (e.g., UNIX: malloc calls sbrk()) - Stack: OS recognizes reference outside legal segment, extends stack implicitly Different protection for different segments - Enables sharing of selected segments - Read-only status for code Supports dynamic relocation of each segment ## DISADVANTAGES OF SEGMENTATION Not Compacted 0KB Each segment must be allocated contiguously 8KB **Operating System** May not have sufficient physical memory for large segments? 16KB (not in use) 24KB **External Fragmentation** Allocated (not in use) Allocated 40KB **48KB** (not in use) 56KB Allocated **64KB** ## REVIEW: MATCH DESCRIPTION ### Description Name of approach - I. one process uses RAM at a time - 2. rewrite code and addresses before running - 3. add per-process starting location to virt addr to obtain phys addr - 4. dynamic approach that verifies address is in valid range - 5. several base+bound pairs per process Candidates: Segmentation, Static Relocation, Base, Base+Bounds, Time Sharing # **PAGING** ### **FRAGMENTATION** Definition: Free memory that can't be usefully allocated Types of fragmentation <u>External:</u> Visible to allocator (e.g., OS) Internal: Visible to requester # **PAGING** free list Goal: Eliminate requirement that address space is contiguous Eliminate external fragmentation Grow segments as needed Idea: Divide address spaces and physical memory into fixed-sized pages Size: 2<sup>n</sup>, Example: 4KB ### TRANSLATION OF PAGE ADDRESSES How to translate logical address to physical address? - High-order bits of address designate page number - Low-order bits of address designate offset within page No addition needed; just append bits correctly... # ADDRESS FORMAT Given known page size, how many bits are needed in address to specify offset in page? | Page Size | Low Bits (offset) | |-------------------------------------|------------------------------------| | I6 bytes I KB I MB (512) bytes 4 KB | 4<br>10 (210=1KB)<br>20<br>9<br>12 | # ADDRESS FORMAT Given number of bits in virtual address and bits for offset, how many bits for virtual page number? | Page Size | Low Bits(offset) | Virt Addr Bits | High Bits(vpn) | |-----------|------------------|----------------|----------------| | I6 bytes | 4 | 10 | 6 | | I KB | 10 | 20 | lo | | I MB | 20 | 32 | 12 | | 512 bytes | 9 | 16 | 7 | | 4 KB | 12 | 32 | Lo | # **ADDRESS FORMAT** Given number of bits for vpn, how many virtual pages can there be in an address space? | Page Size | Low Bits (offset) | Virt Addr Bits | High Bits (vpn) | Virt Pages | |-----------|-------------------|----------------|-----------------|-------------| | 16 bytes | 4 | 10 | 6 | 2 16 = 64 | | I KB | 10 | 20 | 10 | 2 10 - 1029 | | I MB | 20 | 32 | 12 | 2112 = 46 | | 512 bytes | 9 | 16 | 7 | 2 7 2 128 | | 4 KB | 12 | 32 | 20 | 220. | | | | | | TMR | # VIRTUAL -> PHYSICAL PAGE MAPPING Number of bits in virtual address need not equal number of bits in physical address 64 bit 2164 Virtual Virtual address How should OS translate VPN to PPN? ### **PAGETABLES** # PER-PROCESS PAGETABLE # FILL IN PAGETABLE # **QUIZ: HOW BIG IS A PAGETABLE?** v big is a typical page table? - assume 32-bit address space - assume 4 KB pages = 12 bit offet 2 Number of entries x size entry How big is a typical page table? Number of entries = 20 hits of VPN = 2 20 hits of VPN = 2 20 hits of VPN = 1 MB of virtual pages = 1 MB of virtual pages × 4 bytes ### WHERE ARE PAGETABLES STORED? -> Kernel or OS Implication: Store each page table in memory Hardware finds page table base with register (e.g., CR3 on x86) Where is the page table What happens on a context-switch? Change contents of page table base register to newly scheduled process Save old page table base register in PCB of descheduled process ## OTHER PAGETABLE INFO What other info is in pagetable entries besides translation? - valid bit - protection bits - present bit (needed later) - reference bit (needed later) - dirty bit (needed later) Pagetable entries are just bits stored in memory Agreement between hw and OS about interpretation # MEMORY ACCESSES WITH PAGING 0x0010: movl 0x1100, %edi Assume PT is at phys addr 0x5000 Assume PTE's are 4 bytes Assume 4KB pages How many bits for offset? 12 Simplified view 0, of page table 80 99 Fetch instruction at logical addr 0x0010 - Access page table to get ppn for vpn 0 - Mem ref 1: To the pagetable 04 500 - Learn vpn 0 is at ppn 2 - Fetch instruction at \_\_\_\_\_\_ (Mem ref 2) [ PPN] COARD Exec, load from logical addr 000 100 - Access page table to get ppn for vpn I - Learn vpn I is at ppn 0 - Movl from Ox O100 into reg (Mem ref 4) ### **SUMMARY: PAGING** ### ADVANTAGES OF PAGING #### No external fragmentation - Any page can be placed in any frame in physical memory - Fast to allocate and free - Alloc: No searching for suitable free space - Free: Doesn't have to coalesce with adjacent free space #### Simple to swap-out portions of memory to disk (later lecture) - Page size matches disk block size - Can run process when some pages are on disk - Add "present" bit to PTE ### DISADVANTAGES OF PAGING Internal fragmentation: Page size may not match size needed by process - Wasted memory grows with larger pages - Tension? Additional memory reference to page table $\rightarrow$ Very inefficient - Page table must be stored in memory - MMU stores only base address of page table Simple page table: Requires PTE for all pages in address space Entry needed even if page not allocated? ### PAGING TRANSLATION STEPS #### For each mem reference: - I. extract **VPN** (virt page num) from **VA** (virt addr) - 2. calculate addr of **PTE** (page table entry) - 3. read **PTE** from memory - 4. extract **PFN** (page frame num) - 5. build **PA** (phys addr) - 6. read contents of **PA** from memory into register Which expensive step will we avoid next? ### **EXAMPLE: ARRAY ITERATOR** ``` int sum = 0; for (i=0; i<N; i++){ sum += a[i]; }</pre> ``` Assume 'a' starts at 0x3000 Ignore instruction fetches and access to 'i' What virtual addresses? load 0x3000 load 0x3004 load 0x3008 load 0x300C What physical addresses? load 0x100C load 0x7000 load 0x100C load 0x7004 load 0x100C load 0x7008 load 0x100C load 0x700C ### STRATEGY: CACHE PAGE TRANSLATIONS TLB: TRANSLATION LOOKASIDE BUFFER ### TLB ORGANIZATION #### Fully associative Any given translation can be anywhere in the TLB Hardware will search the entire TLB in parallel ### ARRAY ITERATOR (W/TLB) ``` int sum = 0; for (i = 0; i < 2048; i++){ sum += a[i]; }</pre> ``` Assume 'a' starts at 0x1000 Ignore instruction fetches and access to 'i' Assume following virtual address stream: load $0 \times 1000$ load 0x1004 load 0x1008 load 0x100C • • • What will TLB behavior look like? # TLB ACCESSES: SEQUENTIAL EXAMPLE ### TLB ACCESSES: SEQUENTIAL EXAMPLE ### PERFORMANCE OF TLB? ``` int sum = 0; for (i=0; i<2048; i++) { sum += a[i]; }</pre> ``` Would hit rate get better or worse with smaller pages? ``` Miss rate of TLB: #TLB misses / #TLB lookups #TLB lookups? number of accesses to a = 2048 #TLB misses? = number of unique pages accessed = 2048 / (elements of 'a' per 4K page) = 2K / (4K / sizeof(int)) = 2K / IK = 2 ``` Miss rate? = 2/2048 = 0.1% Hit rate? (I - miss rate) = 99.9% ### TLB PERFORMANCE How can system improve hit rate given fixed number of TLB entries? Increase page size: Fewer unique page translations needed to access same amount of memory TLB Reach: Number of TLB entries \* Page Size ### TLB PERFORMANCE WITH WORKLOADS Sequential array accesses almost always hit in TLB – Very fast! What access pattern will be slow? Highly random, with no repeat accesses ### **WORKLOAD ACCESS PATTERNS** #### Workload A ``` int sum = 0; for (i=0; i<2048; i++) { sum += a[i]; }</pre> ``` #### Workload B ``` int sum = 0; srand(1234); for (i=0; i<1000; i++) { sum += a[rand() % N]; } srand(1234); for (i=0; i<1000; i++) { sum += a[rand() % N]; }</pre> ``` ### **WORKLOAD ACCESS PATTERNS** ### **WORKLOAD LOCALITY** **Spatial Locality**: future access will be to nearby addresses **Temporal Locality**: future access will be repeats to the same data What TLB characteristics are best for each type? #### Spatial: - Access same page repeatedly; need same vpn → ppn translation - Same TLB entry re-used #### Temporal: - Access same address near in future - Same TLB entry re-used in near future - How near in future? How many TLB entries are there? ### OTHER TLB CHALLENGES How to replace TLB entries? LRU? Random? TLB on context switches? HW or OS? ### **NEXT STEPS** Project Ib: Due tomorrow! Project 2a: Out tomorrow Discussion today: Process API, Project 2a Next class: More TLBs and better pagetables!