i. Good? Bad?
i. Why? internal BW not account for seeks, fact that OS /interrupt too slow to read whole tracks at a time
i. CPU in use is slow – 1 mips.
i. SCSI: enterprise disks, higher speed (up to 15,000 rpm), smaller diameter (2.5 in), smaller capacity. Seek time ~ 4.3 ms, more platters
ii. IDE/SATA: personal disks: lower speed (5,400 - 7,200 rpm), larger diameter (3.5 in). May not even use all surfaces available (e.g. 3 out of 4). Seek time ~9ms, fewer platters
i. Density: 30 GB/sq. in.
ii. ZBR: zoned bit recording. More zones on outer tracks, read more bits per revolution
iii. Track ordering: serpentine. Tracks go in on top platter, on on bottom, in on next platter, then out. Cylinder idea not so good any more: platter-platter switching time may be higher than track-track seek.
iv. Old model: 3 dimension address of a block: surface, cylinder, sector
1. Why no cylinders?
2. A: Disk geometries change a lot: Tracks per inch, sectors per track (varies over track). Complicated to expose all this to OS. Move to linear
3. Flaws: disk can map around flaws – move a block somewhere else. Hard to do if raw geometry exposed.
i. provide a queue of commands, disk can do some ordering: e.g. balance both seek and rotation cost to optimize next block
ii. can get average seek distance down to 1/10 maximum (vs. 1/3 for ata, non-queued)
i. seek time (#1)
ii. rotational latency (#2)
i. Initially ordered, but then random as file system ages
ii. Digression on FS aging: why it is important to test old file systems
i. Need to support small files efficiently
1. At the time, most files were very small
2. 4 kb block size alone wastes 45% of disk space
ii. Need to preserve locality
1. Allocate blocks so not need to minimize seek time, improve throughput
i. 4096 byte blocks
i. 512 byte fragments within a block
i. Cylinder groups
ii. 2 level allocators (inter- and intra- cylinder group )
i. Optimize intra-cylinder allocation for disk / processor capabilities:
i. Change data structures
ii. 2 level decisions (e.g. fragments / blocks, cylinder groups / sectors)
i. QUESTION: Where does locality come from?
ii. A: directories and within files
iii. QUESTION: What about other workloads?
2. Google Index?
i. QUESTION: What are they for? For providing spatial locality of blocks with temporal locality
ii. Group of cylinders near each other, cheap to seek between tracks
iii. Each cylinder group has some bookkeeping information
1. Superblock = description of FS (block size)
2. Space for inodes
3. Free block bitmap
4. QUESTION: What is this technique:
a. A: Change data structure to store more information
5. Summary information on data block usage
a. # of available blocks at each rotational position (8 groups in 2 ms increments)
6. Index into block bitmap for each rotational position
iv. Static # of inodes allocated for a cylinder group
i. Each block is broken up into fragments (512+bytes)
ii. Free block bitmap records free fragments
iii. Fragments used for:
1. Small files
2. Tails of large files
iv. Expanding a file with a fragment may require copying data
v. QUESTION: How minimize copying due to fragments as users write data?
1. A: new system call to learn size of blocks, so can write data in complete blocks
vi. QUESTION: Is this a problem? When? There are usually optimal data access values in any system, e.g. VM pages. How much can you hide this?
vii. Benefit: Provides efficiency of small blocks plus transfer rate of large blocks
i. Optimize layout for disk parameters
ii. Parameters used:
1. Processor speed
2. HW support for large transfers
3. Blocks per track
4. Disk spin rate
5. Time between transfers
iii. Goal: find rotationally “optimal” blocks
1. Idea: want to read next block with minimum cost
a. Ideally, head is right before block when you want to read it
2. Depends on:
a. Transfer rate of processor
b. Time to set up next transfer
c. Speed of disk
d. Number of blocks you can read in a row
iv. Pre-allocate indexes to find a “near by” block quickly
1. Store vector of indexes into block map
2. Cylinder group stores # of free blocks at each position
3. Allocator uses vector to find blocks, then looks for ones in the right cylinder group
v. Where does this info come from?
1. Administrators – allows installing FS on one system then moving to another
2. Recent tools determine layout (from CMU)
vi. QUESTION: Why so many parameters? What do they really care about?
1. A: given a block, what is the best next block to read/write
2. What is the rotational delay between subsequent blocks that can be read?
vii. QUESTION: parameterization ties fs layout to disk, processor. Is this a problem?
1. A: what happens if move to other disk? To other CPU? Who cares?
i. QUESTION: What is ideal policy
1. Everything near everything else
ii. QUESTION: how do you balance wanting locality and avoiding hot spots?
iii. Policy levels:
1. Global policies:
a. Use system-wide summary information to place new inodes and data blocks
i. Where do directories and files go
b. Calculate rotationally optimized block layouts
c. Decide when to seek because insufficient blocks in a cylinder
d. Request ideal block
2. Local policies:
a. Assign individual blocks within a file
b. if ideal not available, finds next best block with more accurate information.
1. Minimize seek latency for related accesses
2. Miminize overhead of large transfers
1. Cluster related information
2. Can’t cluster too much. QUESTION: Why?
a. Leads to hot spots; fill up cylinder groups and lead to sub-optimal allocations
b. Must spread load of unrelated files
3. Where does locality come from?
a. Files within a directory (for ls-l)
i. Place all inodes in the same cylinder group
ii. Choose group with most free inodes, fewest directories (worst fit)
iii. Within a cylinder, inodes allocated “next fit” – randomly, but can read all inodes in 8-16 transfers
b. Blocks within a file
i. Try to put all blocks in same cylinder group as directory inode at rotationally optimal places
ii. To spread load, redirect allocation after file grows: initially at 48 kb and then every 1 mb to spread load
iii. Global policy requests specific blocks
iv. If block not available:
1. Next closest block on same cylinder
2. Same cylinder group
3. Quadratic hash
4. Check all cylinder groups
v. Reasoning: want a close block. If not, want to find one quickly. If not that, then disk is nearly full, need to look closely
i. A: Maintain list of free block sorted by cylinder group
i. NOTE: reliability a big problem for disks. What can fail?
1. A surface
2. A track
3. A sector
ii. Information replicated/distributed across disk
1. Superblocks on each cylinder group
2. Superblocks spirals down
a. Any track, cylinder or platter can be lost
i. File locking:
1. Locks only on open files (in-core structures)
2. QUESTION: Why?
3. Only advisory; only apps that ask for locks will see them
4. QUESTION: Why?
a. Admin/system must be able to break locks.
b. QUESTION: Why not have a break-lock API?
ii. Symbolic links
1. Indirection via FS names
2. QUESTION: Benefits?
a. Links across volumes
3. QUESTION: drawbacks?
a. May break
iii. Atomic rename
1. QUESTION: Why need?
2. QUESTION: what does it do?
a. Delete old file, rename new to old
i. Answer: queuing of write traffic instead of synchronous
i. More time spent finding optimal blocks
ii. Disk I/O saturates CPU for copying data to user programs