CS 537 Notes, Section #23: Files, Disk Management
OSTEP: Chapters 36, 37
File: a named collection of bits stored on disk. From
the OS' standpoint, the file consists of a bunch of blocks
stored on the device. Programmer may actually see a different
interface (bytes or records), but this does not matter to the
file system (just pack bytes into blocks, unpack them again
on reading).
Common addressing patterns:
-
Sequential: information is processed in order, one piece
after the other. This is by far the most common mode:
e.g. editor writes out new file, compiler compiles it,
etc.
-
Random Access: can address any record in the file directly
without passing through its predecessors. E.g. the data set
for demand paging, also databases.
-
Keyed: search for records with particular values, e.g.
hash table, associative database, dictionary. Usually not provided by
operating system.
TLB is one example of a keyed search.
Modern file systems must address four general problems:
-
Disk Management: efficient use of disk space, fast access
to files, sharing of space between several users.
-
Naming: how do users select files?
-
Protection: all users are not equal.
-
Reliability: information must last safely for long periods
of time.
Disk Management: how should the disk sectors be used to
represent the blocks
of a file? The structure used to describe which sectors
represent a file is called the file descriptor.
Contiguous allocation:
allocate files like segmented memory
(give each disk sector a number from 0 up). Keep
a free list of unused areas of the disk. When creating a file,
make the user specify its length, allocate all the space at once.
Descriptor contains location and size.
-
Advantages: easy access, both sequential and random.
Simple. Few seeks.
-
Drawbacks: horrible fragmentation will preclude large files,
hard to predict needs. With interleaved user requests, still
cannot eliminate all seeks.
Linked Files:
In the file descriptor, just keep pointer to first block.
In each block of file keep pointer to next block.
Can also keep a linked list of free blocks for the free list.
-
Advantages: Files can be extended, no fragmentation problems.
Sequential Access is easy: just chase links.
No extra space for the free list.
-
Drawbacks: Random access is virtually impossible.
Lots of seeking, even in sequential access.
Pointer to next block occupies same block as data.
-
Example: FAT (MSDOS) file system.
Array of Block Pointers (Extents):
Allocate an array to hold pointers to all the blocks, but do not allocate the blocks.
Then fill in the pointers dynamically using a free list.
-
Advantages: Both sequential and random access are easy.
-
Drawbacks: still have to set maximum file size, and there
will be lots of seeks.
DOS FAT File File System:
A single File Allocation Table (FAT)
that combines free list info and file allocation info.
In file descriptor, keep pointer to first block.
A FAT table entry contains either (1) the block number of the next
block in the file, (2) a distinguished "end of file" (eof) value, or
(3) a distinguished "free" value.
-
Advantages/Disadvantages: similar to those mentioned above for linked
file.
None of these is a very good solution: what is the answer? First,
and MOST IMPORTANT: understand the application. How are file
systems used?
-
Most files are small.
-
Much of the disk is allocated to large files.
-
Many of the I/O's are made to large files.
Thus, the per-file cost must be low, but the performance of large files
must be good. Must allow reasonably fast random access, extensibility.
Copyright © 2013, 2018 Barton P. Miller
Non-University of Wisconsin students and teachers are welcome
to print these notes their personal use.
Further reproduction requires permission of the author.