CS 537
Programming Assignment V
File Systems

Due Date:

May 10th at 11:59pm.


Contents


Introduction

You are to design and implement a simple file system on top of a simulated disk.

Simulated Disk

The simulated disk uses a Unix file named DISK to simulate a disk with NUM_BLOCKS blocks of BLOCK_SIZE bytes per block. It supports three methods:

In each case blockNum is required to be in range 0..Disk.NUM_BLOCKS-1 and buffer should be a byte array of exactly Disk.BLOCK_SIZE bytes. (There are also overloaded versions of read and write as described below.) Note that blocks must be read and written as a whole. If you need to read part of a block, you must read in the entire block and ignore the part you're not interested in. If you need to write part of a block, you must read in the whole block, modify the portion of interest, and write the whole block back out. The constructor looks for a file named DISK in the current directory. If it does not exist, the program will create it and initialize the first block to all zeros. Any other block must be written at least once before it can be read. The stop method prints statistics. It has an optional argument (default true) to indicate whether to remove the DISK file. Since the DISK file can be quite large, you should be sure to remove it before logging off.

You will need to implement files on the simulated disk and some way of allocating disk blocks. You will use an adaptation of the method used by Unix. (In fact, the scheme described below is very similar to the one used by the original version of Unix-the so-called ``Sixth Edition'' version for the PDP-11).

Super Block

Block 0 of the disk is the so-called ``super block'' , which contains information about the rest of the disk. You will want to keep a copy of this block in memory at all times. It should be read in when the file system starts up, and written back out before shutting down. The super block should hold the following variables:


class SuperBlock {
    public int size;
    public int isize;
    public int freeList;
}

The size of the file system is recorded in the super block to allow the file system to use less than the whole disk and to support various sizes of disk. In all the data structures on disk, a ``pointer'' to a disk block is an integer in the range 1..NUM_BLOCKS-1. Since block 0 is treated specially, you can use a block number of zero to represent a null pointer.

Free Space

You will use the free list space management technique discussed in Section 11.3.3 of the text on page 384. More specifically, each block of the free list contains Disk.POINTERS_PER_BLOCK block numbers, where POINTERS_PER_BLOCK = BLOCKSIZE/4 (4 is the size of an integer in bytes). The first of these is the block number of the next block on the free list. The remaining entries are block numbers of additional free blocks (whose contents are assumed to be meaningless). While the system is running, you will want to keep a copy of the first block of the free list in memory.

File Structure

The technique is third method described in Section 11.2.3 of the text on page 379. Each file in the system is described by an index node (inode for short).1


class Inode {
    public final static int SIZE = 64;    // size in bytes
    public int flags;
    public int owner;
    public int size;
    public int ptr[] = new int[13];
}
If the flags field is zero, the index block is unused. In a real file system, the bits of this int distinguish different types of file (directories, data files, etc.) and indicate permissions. You do not have to implement these features. Similarly, you may ignore the owner field. The size field indicates the current size of the file, in bytes.

Block 0 of the disk is the super block. Blocks 1 through isize are packed with inodes.1


    class InodeBlock {
        public Inode node[] = new Inode[Disk.BLOCK_SIZE/Inode.SIZE];
    }
The remaining blocks may be allocated as direct or indirect blocks, or placed on the free list. They are collectively known as data blocks.

The data blocks that contain the contents of the files are called direct blocks. The ptr fields in an inode point (directly or indirectly) to these blocks. The first 10 pointers point to the first 10 direct blocks. The 11th pointer (ptr[10]) points to an indirect block. This block contains pointers to the next Disk.BLOCK_SIZE/4 direct blocks of the file.1


class IndirectBlock {
    public int ptr[] = new int[Disk.BLOCK_SIZE/4];
}
(Note that the blocks on the free list have the same format). Pointer ptr[11] points to a ``doubly indirect'' block. It is filled with pointers to indirect blocks, each of which contains pointers to data blocks. Similarly, the final pointer points to a ``triply indirect'' block. The size of the file is determined by the size field of the inode, not by the pointers.

A null pointer (either in the inode or in one of the indirect blocks) may indicate a hole in the file. For example, if the size field indicates that there should be five blocks, but ptr[2]==0, then the third block constitutes a hole. Similarly, if the file is large enough and ptr[10]==0, then blocks 11 through POINTERS_PER_BLOCK + 10 are all holes. Attempts to read from a hole act as if the hole were filled with zeros; an attempt to write into a hole causes the hole to be ``filled in'': Blocks are allocated as necessary and added to the file. Holes are created by seeking beyond the end of the file and then writing.

Inodes are numbered consecutively starting at 1 (not zero!), so block 1 of the disk contains inodes 1..Disk.BLOCK_SIZE/Inode.SIZE, and so on. Files are referenced by these numbers (called ``index numbers'' , or inumbers for short). In a real file system, directory files are used to translate mnemonic names to inumbers, but for this project, we will use inumbers directly.

Other Disk Operations

The data structures SuperBlock, InodeBlock, and IndirectBlock are all the same size, so any one of them can be written to or read from any disk block. For your convenience, we have added three ``overloaded'' versions of read and write to the Disk interface.


    class Disk {
        public Disk() {
        public void read(int blocknum, byte[] buffer) {}
        public void read(int blocknum, SuperBlock block) {}
        public void read(int blocknum, InodeBlock block) {}
        public void read(int blocknum, IndirectBlock block) {}
        public void write(int blocknum, byte[] buffer) {}
        public void write(int blocknum, SuperBlock block) {}
        public void write(int blocknum, InodeBlock block) {}
        public void write(int blocknum, IndirectBlock block) {}
        public void stop() {}
    }

Operations

You must implement the class FileSystem that contains the following ten methods.


class FileSystem {
        public int formatDisk(int size, int isize){}
        public int shutdown(){}
        public int create(){}
        public int inumber(int fd){}
        public int open(int inumber){}
        public int read(int fd, byte[] buffer){}
        public int write(int fd, byte[] buffer){}
        public int seek(int fd, int offset, int whence){}
        public int close(int fd){}
        public int delete(int inumber){}
}
In the tradition of C programming, each function returns an integer value, with -1 meaning ``error'' and a non-negative value (0 unless specified otherwise) meaning ``success.''2

Implementation Hints

Although this is a large project, it should be manageable if you break it down into small pieces. Here is one way (but not the only possible way!) to decompose the problem. The tasks are listed roughly in the order they are needed, although in some cases they are inter-dependent.

Free-space management.
Write methods for allocating and freeing disk blocks. Also write the portion of formatDisk that builds the free list in the first place.
Block access within a file.
Write a method that takes an Inode and a block-offset within the file and returns a pointer to (the block number of) the corresponding block. The method should have a Boolean argument fillHole which specifies what to do if the indicated block does not exist. If the block does not exist and fillHole is false, the method should simply return 0; if fillHole is true, the method should allocate a block, add it to the file, and return its block number. The first version is appropriate for read and the other is appropriate for write. You might want to first write and debug this method for ``small'' files (no indirect blocks) and then modify it to handle large files as well. For large files, if fillHole is true, you may need to allocate one or two indirect blocks and add them to the file.
The file table.
You will need a data structure to keep track of open files. For each file, you will need a pointer to an in-memory copy of its inode, its inumber, and the current seek pointer. You will need methods for allocating and freeing slots in this table, determining whether a file descriptor is valid, and accessing the data associated with a file descriptor.
Accessing inodes.
You will need methods to read a specific inode from disk (given its inumber) and writing back a modified inode. Remember that you can only read and write whole blocks, so to write an inode, you will have to read the block containing it, modify the specific inode, and then write the block back out. You will also need a method to allocate inodes.
Reading and writing arbitrary ranges.
At this point, implementing read and write should not be too hard. An individual read request may touch parts of several blocks, so you will need a loop that reads each of the blocks and copies the appropriate portion of it into the appropriate part of the buffer argument of the read call. The implementation of write is slightly more complicated because if a block is only partially modified, you have to read its old value, copy data from the client's buffer into the appropriate portion of the block, and then write it back out.
Miscellaneous.
Fill in the remaining methods of class FileSystem. The only non-trivial remaining piece is delete, which must return all the data and indirect blocks to the free list and clear the flags field of the inode. It must also check that the file is not currently open.

When you get done, you should thoroughly test all the ten required functions, including creating, reading, writing, closing, reopening, and clearing all sorts of files (small, large, filled with holes, etc.) You should also test the error checking in your code. The main program we supply should be very handy in helping you to do this.

Extra Credits

If you get everything working and throughly tested early, you might consider adding the following two extra-credit features.

Each of the two parts count for 10% extra credit for the project.

I cannot stress too strongly, however, that you should not even think of adding these features until the required part of the project is completely written and debugged.

Program Structure

We have provided several files, all of which may be found in the directory


    ~cs537-2/public/project5/

The Command Interpreter

The main method in class Proj5 implements a simple command interpreter. You can either use it interactively by invoking the program as


    java Proj5
or you can have it run a test script by typing, for example,

    java Proj5 test1
Input lines starting with ``/*'' or ``//'' are ignored (the latter are echoed to the output). Other lines have the format

    [ var = ] command [ args ]
The optional prefix var = causes the result of the command to be assigned to a variable. In any case, the result of the command is printed. The there is one command for each of the ten methods of class FileSystem as well as three additional commands: help, vars, and quit. The help command prints a list of commands, the vars command lists the current values of all interpreter variables that have been assigned values, and the quit command terminates the program. With the exception of the second argument to write, each argument can be either an integer or the name of a variable. The command

    write fd pattern size
writes size bytes to the indicated file at the current offset. The data is generated by repeating pattern over and over the required number of times.

Grading

You are to prepare a report describing the design and structure of your directory and file system. The report should be not more than two typewritten pages, not including diagrams. You should carefully describe all design decisions you made and explain how these decisions affect the performance of your file/directory system. You may assume that this handout is part of the documentation of your program. Thus you need not repeat information that is in this handout.

Handing In

You must work in groups of 2 for this project.

As always, points will be deducted for code that fails to satisfy the minimal criteria for comments and structure specified in the hand-in directions for project number 2.


1There is also an artifact of Java here that would not be present in a real system. In Java, the Inode structure is stored in memory as three integers followed by a pointer to an array of thirteen more integers. There would also be additional information to indicate the type of the Inode structure and the size of the array. On disk, however, the Inode structure is simply 16 integers in a row, like the C structure


    struct inode {
        int flags;
        int owner;
        int size;
        int ptr[13];
    };
Unfortunately, there's no easy way to create exactly this structure in memory in Java, but fortunately, you will probably never notice the difference. Similar remarks apply to InodeBlock and IndirectBlock.

2A real system would need some way to indicate what sort of error occurred. In Unix, the nature of the error is indicated by an integer error code placed in a global variable called errno. For this project, you can just print an error message. A more ``Java-like'' design would use exceptions to indicate errors.

3In real Unix, this array is split into three parts. Each process has its own table of open files. There is a single system-wide table of so-called ``in-core inodes'' shared among all processes. Each entry in this table has a reference count so that it can be removed when the last process closes the file. Seek pointers are kept in yet another system-wide table so that there can be multiple seek pointers into the same file, and multiple processes can share a seek pointer. For this project, you can combine all this information into one table.

4In real Unix, the argument is a pathname. The file system uses the directories to translate this name into an inumber.

5In real Unix, deletion is delayed until all processes that have the file open close it.


cao@cs.wisc.edu