CS 537 - Quiz #10

	UNIVERSITY OF WISCONSIN-MADISON Computer Sciences Department
CS 537 Spring 2000		A. Arpaci-Dusseau
	Quiz #10 Solutions: Wednesday, May 3

Name: Student ID #:

Problem 1: File Allocation (40 points)

a) Similarities exist between policies for allocating blocks on disk to files and policies for allocating physical memory to processes. Match the file allocation policy with the most similar memory allocation policy; note that each memory allocation policy may be used more than once.

_d_ Linked                              a.  Base and Bounds
_b_ Indexed                             b.  Paging
_a_ Contiguous                          c.  Segmentation
_c_ Extent-based                        d.  None of the above
_d_ FAT
_b_ Multi-level Indexed

An analogy exists between a file allocation policy and the memory allocation policy when they require the same amount of infrastructure to refer to the allocated data and share advantages and disadvantages.
For example, both the Contiguous policy and the Base and Bounds approach require a base pointer and a size field; both suffer from external fragmentation (but not internal) and may require compacting all space to allocate more data to a file or an address space.
Likewise, Extent-based and Segmentation both have a base pointer and a size field for each contiguously allocated region (whether called an extent or a segment); these approaches are both extensions of either Contiguous or Base and Bounds.
The two Indexed versions and Paging all use fixed sized allocation units and require a pointer to each unit (whether a disk block or a page).

Finally, Linked and FAT do not have counterparts in memory allocation, since each requires that you examine the previous data block (or its index in the FAT table) to find the next data block.

The remaining questions in this section are multiple-choice. In each case, circle the best answer and briefly explain your answer. Without your explanation, we will not give credit for an answer. Unless otherwise note, you should only circle one answer.

b) The FAT approach is an optimization of which other approach? What optimization does it perform?

    a) Linked
    b) Indexed
    c) Contiguous
    d) Extent-based
    e) Multi-level Indexed

The Linked approach includes a pointer at the end of each block to the next block in the file. The FAT approach essentially moves all of these pointers to a single table that can be cached in main memory.

c) Extent-based allocations allow files to grow more easily than which other allocation policy? Why?

    a) Linked
    b) Contiguous
    c) FAT
    d) Multi-level Indexed

Extent-based allocation allows a file to consist of multiple contiguous regions instead of just a single contiguous region. Therefore, when the file grows, if there is no free space at the end of the last region, a new extent can be added.

d) Which approach has the worst support for reading/writing files with a random-access pattern? Why?

    a) Linked 
    b) Indexed
    c) Contiguous
    d) Extent-based
    e) FAT
    f) Multi-level Indexed

The linked approach requires scanning through all previous disk blocks to find the location of a random disk block in the file.

e) Which allocation schemes suffer from internal fragmentation? Circle all that apply and explain why.

    a) Linked
    b) Indexed
    c) Contiguous
    d) Extent-based
    e) FAT
    f) Multi-level Indexed

Internal fragmentation corresponds to exactly those policies that use fixed size allocation units that are greater than the minimum sector size.

f) Which allocation schemes may limit the maximum size of a file to less than that of the entire disk even when this is the only file on the disk? Circle all that apply and briefly explain why.

    a) Linked
    b) Contiguous
    c) Extent-based
    d) FAT
    e) Multi-level Indexed

Multi-level indexing limits the size of a file to the amount of data that can be reached from its direct, indirect, double-indirect, and triple-indirect pointers. This is a fixed limit based on the structure of the i-node, which may not correspond to the size of the disk.

This is not true of Contiguous or Extent-based because they can modify the size field(s) to point to the entire disk. Likewise, Linked and FAT can point to a chain of disk blocks across the entire disk.

Problem 2: Berkeley Fast File System (40 points)

a) Consider the following bitmap of free fragments in FFS. Assume that each fragment is 1KB and each block is 4KB. The letter A denotes a fragment allocated to file A, the letter B a fragment allocated to file B, and 0 a free fragment. (Note that in the real system, an allocated fragment would just be marked with a 1, not an A or B; we are simply showing the corresponding file for clarity in this question.)

Imagine that a new fragment is allocated to file A (e.g., a new fragment is written to the end of file A). How will the allocation of fragments on the disk change? Which new disk fragment(s) will be allocated to file A? To be clear, you may want to redraw the bitmap of fragments.

FFS will not allocate a file to multiple partial blocks. Only the last block of a file can be partially allocated to that file.

Since each block has four fragments, we know that fragment #7 is in a partial block and must have previosly been the last fragment of file A. When a new fragment for file A is allocated, we will need a whole new block to contain the two fragments. Therefore, fragment #7 will be moved to fragment #12 and the new fragment will be allocated to fragment #13.

b)On a disk with 4096-byte blocks, the data blocks for a large file are allocated to a new cylinder group after 48KB. What can you infer about the structure of the FFS i-node from this 48KB value?

The data blocks for a large file are moved to a new cylinder group when the first indirect block in the multi-level indexed i-node is used. Given that each block is 4096B, if this happens after 48KB, then there are 48KB/4K = 12 direct data pointers in the i-node.

c) FFS was not designed with modern disk technology, such as multi-zone disks, in mind. Imagine that you have been asked to modify/optimize FFS to have better performance when run on multi-zone disks.

What would you suggest should be allocated on the outer cylinders of the disk? Circle only the one best answer and explain why.

    a)  I-nodes for files in the same directory
    b)  Data blocks for large files
    c)  Data blocks for files in the same directory
    d)  Data blocks for directories

On a multi-zone disk, cylinders on the outside of the disk have more sectors and therefore deliver data at a higher bandwidth (given a constant RPM). The question is then, what needs the highest bandwidth? We know from file workload studies that most data is transferred in large files; therefore, to get the best performance for the workload, the file system should deliver the highest bandwidth for large files.

In the default FFS, we already know that both a), I-nodes for files in the same directory, and b), data blocks for files in the same directory, should be allocated in the same cylinder group to get good locality; however, there is no particular reason why the outer cyclinder groups should be used instead of the inner cylinder groups. There is nothing special about Data blocks for directories (d), that would require the highest bandwidth.

d) Assume that you have been given a bizarre prototype disk for which you must optimize FFS. (Forget about multi-zone disks.) On this prototype disk, it is very expensive to switch between disk heads, but seek time is negligible.

To help performance, FFS associated workload locality with a disk organizational structure called cylinder groups. On the bizarre prototype disk, which sectors should be grouped together? Very briefly, explain why.

    a)  Sectors in nearby cylinders
    b)  Sectors on the same platter
    c)  Sectors on nearby platters
    d)  Sectors on the same surface
    e)  Sectors on nearby surfaces

Each disk head handles the data transfers on a single surface of the disk. Therefore, if it is expensive to switch disk heads, it is expensive to switch surfaces. We will want to group related data items on the same surface on this prototype disk.

Problem 3: Disk Scheduling (20 points)

a) The disk scheduling algorithms that we have examined all try to minimize one component of the time required to transfer a block to (or from) disk. What is this component (or performance metric)?

Seek time.

For example, SSTF, stands for Shortest-Seek-Time-First; it is trying to minimize the seek time of each transfer, i.e., to minimize the time the disk head spends moving across different cylinders.

Likewise, the SCAN algorithm (and its variants) moves the disk head from the outside cylinder to the inside cylinder and back again, handling a request when the head passes over the necessary cylinder.

b) Assume that this performance metric has been optimized for a given workload (i.e., the metric is as small as possible). Given a modern disk, is it possible for the total block transfer time to be less with a different disk scheduling algorithm? Clearly explain why or why not. (Drawing a picture or giving an example may help your explanation.)

Yes, it is possible for a different scheduling algorithm to have the same (or even worse) seek time, but still complete the disk workload more quickly if it takes rotational delay into account. Rotational delay is the time spent waiting for a particular sector on a track to spin around the disk and reach the disk head. Both seek time and rotational delay determine the time spent waiting before the actual transfer can begin.

In fact, recent disk scheduling algorithms do take both rotational latency and seek time into account. On modern disks, the time to seek across all cylinders and to wait for the disk to complete a revolution are pretty similar.

For an example, consider a case where the disk head is currently at cylinder 1, sector 40 and there is a workload of three requests:

cylinder 1, sector 50
cylinder 2, sector 48
cylinder 3, sector 46

If the algorithm minimizes seek time, it will handle the requests for cylinder 1, cylinder 2, and cylinder 3 in that order. However, this will incur a high rotation delay. First, it will wait for 10 sectors (i.e., 50-40 sectors) to spin by in cylinder 1 (we assume that the disk is spinning such that sectors pass by in increasing order). Then the disk arm will move to cylinder 2. At this point, the disk arm will be past sectore 50 and will have to wait for sector 48 to spin around again. If there are many sectors per track, this will take a long time. Then, the disk arm will move to cylinder 3 and must once again wait for sector 46 to spin around. Therefore, the total time to handle this simple workload is high due to all of the rotational delay.
On the other hand, if the algorithm minimizes rotational delay, it will handle the request for sector 46, then 48, then 50. The disk arm will never have to wait for the entire disk to spin all the way around and rotational delay will be small. Seek time is still low in this example as well. Therefore, the total time to handle this workload is smaller with the second algorithm than the first that was optimal in terms of seek time.