CS 537
Programming Assignment V
File Systems

Due Date:

May 10th at 11:59pm.

Introduction

You are to design and implement a simple file system on top of a simulated disk.

Simulated Disk

The simulated disk uses a Unix file named DISK to simulate a disk with NUM_BLOCKS blocks of BLOCK_SIZE bytes per block. It supports three methods:

void read(int blockNum, byte[] buffer);
// Read a block from the disk into a buffer
void write(int blockNum, byte[] buffer);
// Write a block from a buffer onto the disk
int stop(boolean removeFile);
// Stop the disk and report how many read and writes took place. If removeFile is true, it will also delete the DISK file.

In each case blockNum is required to be in range 0..Disk.NUM_BLOCKS-1 and buffer should be a byte array of exactly Disk.BLOCK_SIZE bytes. (There are also overloaded versions of read and write as described below.) Note that blocks must be read and written as a whole. If you need to read part of a block, you must read in the entire block and ignore the part you're not interested in. If you need to write part of a block, you must read in the whole block, modify the portion of interest, and write the whole block back out. The constructor looks for a file named DISK in the current directory. If it does not exist, the program will create it and initialize the first block to all zeros. Any other block must be written at least once before it can be read. The stop method prints statistics. It has an optional argument (default true) to indicate whether to remove the DISK file. Since the DISK file can be quite large, you should be sure to remove it before logging off.

You will need to implement files on the simulated disk and some way of allocating disk blocks. You will use an adaptation of the method used by Unix. (In fact, the scheme described below is very similar to the one used by the original version of Unix-the so-called ``Sixth Edition'' version for the PDP-11).

Super Block

Block 0 of the disk is the so-called ``super block'' , which contains information about the rest of the disk. You will want to keep a copy of this block in memory at all times. It should be read in when the file system starts up, and written back out before shutting down. The super block should hold the following variables:


class SuperBlock {
    public int size;
    public int isize;
    public int freeList;
}

int size;
Number of blocks in the file system
int isize;
Number of index blocks.
int freeList;
First block of the free list.
The remainder of the super block will just be filler

The size of the file system is recorded in the super block to allow the file system to use less than the whole disk and to support various sizes of disk. In all the data structures on disk, a ``pointer'' to a disk block is an integer in the range 1..NUM_BLOCKS-1. Since block 0 is treated specially, you can use a block number of zero to represent a null pointer.

Free Space

You will use the free list space management technique discussed in Section 11.3.3 of the text on page 384. More specifically, each block of the free list contains Disk.POINTERS_PER_BLOCK block numbers, where POINTERS_PER_BLOCK = BLOCKSIZE/4 (4 is the size of an integer in bytes). The first of these is the block number of the next block on the free list. The remaining entries are block numbers of additional free blocks (whose contents are assumed to be meaningless). While the system is running, you will want to keep a copy of the first block of the free list in memory.

File Structure

The technique is third method described in Section 11.2.3 of the text on page 379. Each file in the system is described by an index node (inode for short).¹


class Inode {
    public final static int SIZE = 64;    // size in bytes
    public int flags;
    public int owner;
    public int size;
    public int ptr[] = new int[13];
}

If the flags field is zero, the index block is unused. In a real file system, the bits of this int distinguish different types of file (directories, data files, etc.) and indicate permissions. You do not have to implement these features. Similarly, you may ignore the owner field. The size field indicates the current size of the file, in bytes.

Block 0 of the disk is the super block. Blocks 1 through isize are packed with inodes.¹


    class InodeBlock {
        public Inode node[] = new Inode[Disk.BLOCK_SIZE/Inode.SIZE];
    }

The remaining blocks may be allocated as direct or indirect blocks, or placed on the free list. They are collectively known as data blocks.

The data blocks that contain the contents of the files are called direct blocks. The ptr fields in an inode point (directly or indirectly) to these blocks. The first 10 pointers point to the first 10 direct blocks. The 11th pointer (ptr[10]) points to an indirect block. This block contains pointers to the next Disk.BLOCK_SIZE/4 direct blocks of the file.¹


class IndirectBlock {
    public int ptr[] = new int[Disk.BLOCK_SIZE/4];
}

(Note that the blocks on the free list have the same format). Pointer ptr[11] points to a ``doubly indirect'' block. It is filled with pointers to indirect blocks, each of which contains pointers to data blocks. Similarly, the final pointer points to a ``triply indirect'' block. The size of the file is determined by the size field of the inode, not by the pointers.

A null pointer (either in the inode or in one of the indirect blocks) may indicate a hole in the file. For example, if the size field indicates that there should be five blocks, but ptr[2]==0, then the third block constitutes a hole. Similarly, if the file is large enough and ptr[10]==0, then blocks 11 through POINTERS_PER_BLOCK + 10 are all holes. Attempts to read from a hole act as if the hole were filled with zeros; an attempt to write into a hole causes the hole to be ``filled in'': Blocks are allocated as necessary and added to the file. Holes are created by seeking beyond the end of the file and then writing.

Inodes are numbered consecutively starting at 1 (not zero!), so block 1 of the disk contains inodes 1..Disk.BLOCK_SIZE/Inode.SIZE, and so on. Files are referenced by these numbers (called ``index numbers'' , or inumbers for short). In a real file system, directory files are used to translate mnemonic names to inumbers, but for this project, we will use inumbers directly.

Other Disk Operations

The data structures SuperBlock, InodeBlock, and IndirectBlock are all the same size, so any one of them can be written to or read from any disk block. For your convenience, we have added three ``overloaded'' versions of read and write to the Disk interface.


    class Disk {
        public Disk() {
        public void read(int blocknum, byte[] buffer) {}
        public void read(int blocknum, SuperBlock block) {}
        public void read(int blocknum, InodeBlock block) {}
        public void read(int blocknum, IndirectBlock block) {}
        public void write(int blocknum, byte[] buffer) {}
        public void write(int blocknum, SuperBlock block) {}
        public void write(int blocknum, InodeBlock block) {}
        public void write(int blocknum, IndirectBlock block) {}
        public void stop() {}
    }

Operations

You must implement the class FileSystem that contains the following ten methods.


class FileSystem {
        public int formatDisk(int size, int isize){}
        public int shutdown(){}
        public int create(){}
        public int inumber(int fd){}
        public int open(int inumber){}
        public int read(int fd, byte[] buffer){}
        public int write(int fd, byte[] buffer){}
        public int seek(int fd, int offset, int whence){}
        public int close(int fd){}
        public int delete(int inumber){}
}

In the tradition of C programming, each function returns an integer value, with -1 meaning ``error'' and a non-negative value (0 unless specified otherwise) meaning ``success.''²

The method formatDisk initializes the disk to the state representing an empty file-system: It fills in the super block and links all the data blocks into the free list.
The method shutdown closes all open files and shuts down the simulated disk.
The method create creates a new empty file, and open locates an existing file. Each method returns an integer in the range from 0 through 20 called a ``file descriptor'' (fd for short). The fd is an index into an array called a ``file descriptor table'' representing open files. Each entry is associated with one file and also contains a ``file pointer'' (initially zero).³
The argument to open is the inumber of an existing file.⁴ The method inumber returns the inumber of the file corresponding to an open file descriptor.
The methods read, write, seek, and close behave similarly to their Unix counterparts. The method read reads up to buffer.length bytes starting at the current seek pointer. The return value is the number of bytes read. If there are fewer than buffer.length bytes between the current seek pointer and the end of the file (as indicated by the size field in the inode), only the remaining bytes are read. In particular, if the current seek pointer is greater than or equal to the file size, read returns zero and the buffer is unmodified. (The current seek pointer cannot be less than zero). The seek pointer is incremented by the number of bytes read.
The method write transfers buffer.length bytes from buffer to the file starting at the current seek pointer and advances the seek pointer by that amount. It is not an error if the seek pointer is greater than the size of the file. In this case, holes may be created.
The method seek modifies the seek pointer as follows.
```
    public static final int SEEK_SET = 0;
    public static final int SEEK_CUR = 1;
    public static final int SEEK_END = 2;
    ...
    switch (whence) {
        case SEEK_SET: seekPointer = offset; break;
        case SEEK_CUR: seekPointer += offset; break;
        case SEEK_END: seekPointer = file_size + offset; break; 
    }
```
In case 0 (SEEK_SET), the offset is from the beginning of the file. In case 1 (SEEK_CUR), offset is relative to the current seek pointer. In case 2 (SEEK_END), offset is relative end of the file. For cases 1 and 2, the value of the parameter offset can be positive or negative; however the resulting seekPointer must always be positive or zero. If a call to seek would result in a negative value for the seek pointer, the seek pointer is unchanged and the call returns -1. Otherwise, value returned is the new seek pointer (distance in bytes from the start of the file).
The method close writes the inode back to disk and frees the file table entry.
The method delete frees the inode and all of the blocks of the file. It is an error to delete a file that is currently open.⁵
The method Shutdown closes all open files, flushes all in-memory copies of disk structures out to disk, calls the stop() function on the disk, and prints any debugging or statistical information you deem worthwhile.

Implementation Hints

Although this is a large project, it should be manageable if you break it down into small pieces. Here is one way (but not the only possible way!) to decompose the problem. The tasks are listed roughly in the order they are needed, although in some cases they are inter-dependent.

Free-space management.: Write methods for allocating and freeing disk blocks. Also write the portion of formatDisk that builds the free list in the first place.
Block access within a file.: Write a method that takes an Inode and a block-offset within the file and returns a pointer to (the block number of) the corresponding block. The method should have a Boolean argument fillHole which specifies what to do if the indicated block does not exist. If the block does not exist and fillHole is false, the method should simply return 0; if fillHole is true, the method should allocate a block, add it to the file, and return its block number. The first version is appropriate for read and the other is appropriate for write. You might want to first write and debug this method for ``small'' files (no indirect blocks) and then modify it to handle large files as well. For large files, if fillHole is true, you may need to allocate one or two indirect blocks and add them to the file.
The file table.: You will need a data structure to keep track of open files. For each file, you will need a pointer to an in-memory copy of its inode, its inumber, and the current seek pointer. You will need methods for allocating and freeing slots in this table, determining whether a file descriptor is valid, and accessing the data associated with a file descriptor.
Accessing inodes.: You will need methods to read a specific inode from disk (given its inumber) and writing back a modified inode. Remember that you can only read and write whole blocks, so to write an inode, you will have to read the block containing it, modify the specific inode, and then write the block back out. You will also need a method to allocate inodes.
Reading and writing arbitrary ranges.: At this point, implementing read and write should not be too hard. An individual read request may touch parts of several blocks, so you will need a loop that reads each of the blocks and copies the appropriate portion of it into the appropriate part of the buffer argument of the read call. The implementation of write is slightly more complicated because if a block is only partially modified, you have to read its old value, copy data from the client's buffer into the appropriate portion of the block, and then write it back out.
Miscellaneous.: Fill in the remaining methods of class FileSystem. The only non-trivial remaining piece is delete, which must return all the data and indirect blocks to the free list and clear the flags field of the inode. It must also check that the file is not currently open.

When you get done, you should thoroughly test all the ten required functions, including creating, reading, writing, closing, reopening, and clearing all sorts of files (small, large, filled with holes, etc.) You should also test the error checking in your code. The main program we supply should be very handy in helping you to do this.

Extra Credits

If you get everything working and throughly tested early, you might consider adding the following two extra-credit features.

Maintain in memory a cache of InodeBlocks; the cache size is 4 InodeBlocks. When the program needs to read or write an inode, the program should check whether the corresponding InodeBlock is already in the cache. If it is, the program should read or write the in-memory copy. If not, the program should bring in the corresponding InodeBlock, and then performs the read or write operation. The replacement algorithm for the cache should be LRU.
Dirty InodeBlocks must be written back to disk at shutdown.
Maintain in memory a cache of file data blocks; the size of this cache is 64 blocks. When the program needs to read or write a file data block, the program should check whether the corresponding disk block is already in the cache. If it is, the program should read or write the in-memory copy. If not, the program should allocate a block in the cache, and bring in the corresponding disk block in the case of read or partial writes (total block writes don't need to bring in the disk block). The program should then perform the read or write operation. The replacement algorithm for the cache should be LRU.
Dirty blocks must be written back to disk at shutdown.

Each of the two parts count for 10% extra credit for the project.

I cannot stress too strongly, however, that you should not even think of adding these features until the required part of the project is completely written and debugged.

Program Structure

We have provided several files, all of which may be found in the directory


    ~cs537-2/public/project5/

The class Disk is defined in Disk.java. This file also defines the associated classes SuperBlock, Inode, InodeBlock, and IndirectBlock. You may modify or add methods to these classes, but you should not add or remove any data fields.
The file FileSystem.java contains stub (test) versions of the methods you are required to write. Copy this file and edit it to replace the throw statements with code to implement the required functions.
The file Proj5.java contains the main program. It is a simple command interpreter that calls the methods of class FileSystem.
The Makefile can be used to automate compiling and testing your program. The Makefile assumes you write a class called FileTable. You will need to edit the Makefile if you add other classes.
The files test1, test2, etc. are test scripts for testing your program. We may provide additional tests as the due date approaches.

The Command Interpreter

The main method in class Proj5 implements a simple command interpreter. You can either use it interactively by invoking the program as


    java Proj5

or you can have it run a test script by typing, for example,


    java Proj5 test1

Input lines starting with ``/*'' or ``//'' are ignored (the latter are echoed to the output). Other lines have the format


    [ var = ] command [ args ]

The optional prefix var = causes the result of the command to be assigned to a variable. In any case, the result of the command is printed. The there is one command for each of the ten methods of class FileSystem as well as three additional commands: help, vars, and quit. The help command prints a list of commands, the vars command lists the current values of all interpreter variables that have been assigned values, and the quit command terminates the program. With the exception of the second argument to write, each argument can be either an integer or the name of a variable. The command


    write fd pattern size

writes size bytes to the indicated file at the current offset. The data is generated by repeating pattern over and over the required number of times.

Grading

You are to prepare a report describing the design and structure of your directory and file system. The report should be not more than two typewritten pages, not including diagrams. You should carefully describe all design decisions you made and explain how these decisions affect the performance of your file/directory system. You may assume that this handout is part of the documentation of your program. Thus you need not repeat information that is in this handout.

Handing In

You must work in groups of 2 for this project.

You should bring your report and all of the .java files you modified (with your additions clearly detailed in your code or in a separate file) to class on the day the project is due.
You should also place a copy of all of the files needed to run your program (.java files, README file, test scripts, and anything else needed in order to run your code) into the hand-in directory of either you or your partner. The hand-in directories for project 5 can be found at:
```
    ~cs537-2/handin/{your-login-name}/p5
```
You should also place in you directory the results of running the standard test scripts. You are encouraged to devise your own tests and hand those in as well.

As always, points will be deducted for code that fails to satisfy the minimal criteria for comments and structure specified in the hand-in directions for project number 2.

¹There is also an artifact of Java here that would not be present in a real system. In Java, the Inode structure is stored in memory as three integers followed by a pointer to an array of thirteen more integers. There would also be additional information to indicate the type of the Inode structure and the size of the array. On disk, however, the Inode structure is simply 16 integers in a row, like the C structure


    struct inode {
        int flags;
        int owner;
        int size;
        int ptr[13];
    };

Unfortunately, there's no easy way to create exactly this structure in memory in Java, but fortunately, you will probably never notice the difference. Similar remarks apply to InodeBlock and IndirectBlock.

²A real system would need some way to indicate what sort of error occurred. In Unix, the nature of the error is indicated by an integer error code placed in a global variable called errno. For this project, you can just print an error message. A more ``Java-like'' design would use exceptions to indicate errors.

³In real Unix, this array is split into three parts. Each process has its own table of open files. There is a single system-wide table of so-called ``in-core inodes'' shared among all processes. Each entry in this table has a reference count so that it can be removed when the last process closes the file. Seek pointers are kept in yet another system-wide table so that there can be multiple seek pointers into the same file, and multiple processes can share a seek pointer. For this project, you can combine all this information into one table.

⁴In real Unix, the argument is a pathname. The file system uses the directories to translate this name into an inumber.

⁵In real Unix, deletion is delayed until all processes that have the file open close it.

cao@cs.wisc.edu

CS 537 Programming Assignment V File Systems