CS 537 - Spring 2004
Assignment 5: File Systems, Part II - Inodes and Directories

Due: Friday, May 7 at 1:00 am.

There will be no extensions on the due date for this project

Last updated: Wed Apr 29 8:30:13 CDT 2004.
There is a FAQ (Frequently Asked Questions) page associated with this project. Please check it frequently for updates.

Introduction
Getting Started
File System API
Implementation
Testing
Implementation Hints
Grading
What to hand in

Introduction

In the previous assignment, you built a buffer cache and disk scheduler for the MiniKernel. In a real OS, you know that users cannot modify blocks willy-nilly -- they must request access to data through the file system. In this assignment, you will start with a fresh Kernel and build a simple file system.

Getting Started

First, download a fresh copy of the MiniKernel sources. This version has a new, improved version of Disk called FastDisk.java. It also has 8 new methods added to Library.java. As written, they all return error indications to indicate that they are not yet implemented. It is your job to implement them.

FastDisk.java differs from Disk.java in that it has a buffer cache and elevator scheduling algorithm built in. (Actually, that's a lie. It just doesn't simulate any delays, but you can pretend that it has buffering and scheduling built in.) The methods beginRead and beginWrite of Disk.java have been replaced by new methods read and write that perform the operation so quickly that when they return, the operation is complete. There is no need to do any buffering or scheduling of your own, and no need to deal with disk interrupts.

Although the new disk is fast, it is not very big. All block numbers are represented as short integers, so the largest possible disk has 32,767 blocks (16 megabytes, if BLOCK_SIZE is 512).

File System API

This filesystem will allow users to read, write, create, and delete files on disk. The file system is similar to the Unix file system. Like Unix, a file can have more than one name (hard links) and the system automatically garbage-collect a file when all names for it have been deleted. Unlike Unix, there is just one disk-wide directory, so there is no notion of "current working directory" and "/" has no special meaning in file names. A file name is simply a string of up to 30 characters. Also unlike Unix, there is no notion of "file descriptor". Each read or write system call specifies the name of the file to be accessed.

You must implement the 8 system calls format, create, read, write, link, unlink, list, and sync described below.

Unless specified otherwise, each of the following system calls returns -1 to indicate an error and zero for success.

int format(int dsize, int isize);
A fresh disk could have completely random contents except for block zero, which will be filled with zero bytes. This function initializes the disk so that it is empty and ready to use. The parameters control the allocation of space for inodes and the directory (see the section on implementation below for details). Any function other than format fails (returns -1) if applied to an unformatted disk.
int create(String fname);
Creates a new empty file (size zero) file with name fname. It is an error if fname is null, fname indicates an existing file, or fname.length() is greater than 30.
int read(String fname, int offset, byte buffer[]);
Reads up to buffer.length bytes from the file indicated by fname, starting offset bytes from the start of the file. If offset + buffer.length exceeds the current length of the file, this function reads as many bytes as possible, putting them into the beginning of buffer. In particular, if offset is greater than or equal to the file size, no bytes are read and the result is zero. The return value is the number of bytes read, or -1 to indicate errors. It is an error if fname does not indicate an existing file, offset is negative, or buffer is null.
int write(String fname, int offset, byte buffer[]);
Writes the contents of buffer to the file indicated by fname, starting offset bytes from the start of the file. The operation may overwrite existing data in the file and/or append to the end of the file. If offset is greater than the current size of the file, the operation behaves as if the part of the file between the old end of the file and the new data written is filled with zero bytes ((byte) 0). The return value is the number of bytes written (normally buffer.length), or -1 to indicate errors. It is an error if fname does not indicate an existing file, offset is negative, buffer is null, or there is not enough space left on the disk.
int link(String oldName, String newName);
Adds an additional name (link) to an existing file. After this call, both oldName and newName refer to the same file. In a sense, they are aliases for each other. It is an error if oldName does not indicate an existing file or newName is null, indicates an existing file, or is more than 30 characters long.
int unlink(String fname);
Deletes the name fname from the directory. If this is the last name for the file, the file is deleted and all its resources are freed. It is an error if fname does not indicate an existing file.
int list();
Displays the contents of the file system on System.out. There should be one line for each name in the directory. Each line should contain the inumber of a file (see DiskStructures below) and an file name separated by one space. The directory listing should be followed by a listing of the inodes. Each line should include an inumber followed by the seven numbers in the corresponding inode separated by spaces (the file size, the link count, the four data block numbers and the indirect block numbers). Only inodes with a non-zero link count should be shown.
int sync();
Flushes all cached information to disk. In particular, the directory needs to be converted to disk format (see below). It is an error if there is not enough space left on disk to write everything out.

Implementation

IMPORTANT: You do not have to worry about race conditions for this project. You may assume there is just one instance of FileTester running, doing one kernel operation at a time. You had enough problems dealing with race conditions on projects 2, 3, and 4.

Disk Structures

Information on the disk consists of a superblock, the directory, the inodes, and other blocks. The superblock is stored in block zero of the disk. It describes the layout of the filesystem. No matter what the geometry of the disk or the size of the filesystem, the OS should be able to read block zero and know exactly how to use the disk. The superblock contains five integers: diskSize is size of the disk (in blocks), isize is the size of the inode area, in blocks, dsize is the size of the directory area, in blocks, dirents is the number of names in the directory, and freeHead is the block number of the first block in the free list.

The inode area immediately follows the superblock. Each block is packed with 16-byte inodes. The inumber of an inode is its position in the array. For example, the first inode has inumber 0, the first inode in the second block of inodes has inumber BLOCK_SIZE/16, and so on. Each inode contains a (four-byte) integer file size, which indicates the size of the file in bytes, and 6 short (two-byte) integers: a link count and 5 block numbers. The link count indicates the number of directory entries that point at this inode. If the link count is zero, this inode is unused and the remaining fields should be ignored. The first four block numbers are the first four data blocks of the file. The fifth block number is the block number of an indirect block, which contains the block numbers of the remaining data blocks of the file. Since each block number in the indirect block uses two bytes, the maximum size of any file is limited to 4 + BLOCK_SIZE / 2 blocks. In any inode or indirect block, a block number of zero is treated as null, since block zero (the superblock) cannot be part of any file.

The directory immediately follows the inode area. Each block is packed with 32-byte directory entries, each of which consists of a short (two-byte) inumber and a 30-byte name. If the name is less than 30 characters long, it is padded with zero bytes ((byte)0). Only the first superblock.dirents entries are used. The remaining entries may be assumed to be filled with random data.

The remaining blocks of the disk after the superblock, inode area, and directory are all either data or indirect blocks or unused. The unused blocks are linked together in a singly linked list call the free list. The first two bytes of each block in the free list contains the block number of the next block on the list; the last block starts with zero. The freeHead field in the superblock contains the block number of the first block in the free list.

Memory Structures

Although each disk block is read and written as an array of BLOCK_SIZE bytes, you will be converting back and forth between the binary format used on disk and internal data structures. For example, the superblock (block zero of the disk) contains five integers, each represented as four bytes, but in memory, you will want to store these as Java int integers. In this file system, integers are written to disk in "little-endian" format, with the least significant bits first. For example, a short is stored in two bytes, with the low-order 8 bits in the first byte and the high-order 8 bits in the second byte.

You will need to write methods to convert between the two representations. Here are a couple of methods to get you started.


    /** Store a 16-bit integer into a byte array.
     * @param n the integer to be stored
     * @param buf the byte array into which it should be stored
     * @param offset the index of the first byte to be modified
     */
    static void pack(short n, byte[] buf, int offset) {
        buf[offset] = (byte) n;
        buf[offset + 1] = (byte) (n >> 8);
    }

    /** Convert a field in a byte array to an integer.
     * @param buf the byte array containing the data.
     * @param offset the location in the array where the data starts.
     * @return the short integer value.
     */
    static short unpackShort(byte[] buf, int offset) {
        return (short) (
            (buf[offset] & 0xff)
            + ((buf[offset + 1] & 0xff) << 8)
        );
    }

You will want to write similar methods to pack and unpack (4-byte) integer, inodes, and indirect blocks. In memory you may want to store the information from an inode in a structure like this.


/** In-memory representation of an inode */
public class Inode {
    /** "Logical" size of this file, in bytes */
    public int length;

    /** Number of directory entries pointing to this this inode.
     * Zero means this inode is unused (and other fields should be ignored).
     */
    public short linkCount;

    /* Pointers to the first four data blocks */
    public short data[] = new short[4];

    /* Pointer to the an indirect block */
    public short indirect;

    /** Pack this Inode into a byte array for writing to disk.
     * @param buf    an array to be written to disk.
     * @param offset the offset in buf, in units of inodes (the first inode
     *               goes at offset 0, the second at offset 1, etc.)
     */
    public void pack(byte[] buf, int offset) { ... }

    /** Initialize this Inode from information read from disk.
     * @param buf    an in-memory copy of a disk block.
     * @param offset the offset in buf, in units of inodes (the first inode
     *               goes at offset 0, the second at offset 1, etc.)
     */
    public void unpack(byte[] buf, int offset) { ... }
}

A directory entry contains a short int inumber and a name of up to 30 characters. On disk, a directory entry is 32 bytes long. The integer is packed into the first two bytes and the characters of the name are packed into the remaining bytes, one character per byte, and padded with zero bytes as necessary. Java characters are 16-bits long, but unless you use names with foreign characters in them, the high 8 bytes of each character will be zero. You might want to convert a string from memory to disk representation with code like this.


    static void pack(String name, byte[] buf, int offset) {
        int i;
        for (i = 0; i < name.length(); i++) {
            buf[i + offset] = (byte) name.charAt(i);
        }
        for ( ; i < 30; i++) {
            buf[i + offset] = 0;
        }
    }

The conversion in the other direction is similar, but no cast is required when assigning a byte to a char.

Caching

In a real operating system, caching is used extensively to improve performance. For example, when file is opened, a copy of its inode is read from disk and cached in memory. All operations on the file then access and modify this so-called "in-core inode". If the inode is modified, it may be written back to disk when the file is closed, or perhaps sooner if the operating system wants to limit the damage caused by a system crash. Since multiple processes may open the same file concurrently, the OS might keep a reference count so that the in-core inode can be discarded when the last process closes the file.

In this project there is no open or close system call, and performance is a not primary concern, so you should not worry about caching. You should read in the appropriate inode on each each system call, and write it back to disk if it was changed. However, be careful: You cannot write part of a disk block, so to modify a single inode, you will need to read in the entire block containing the inode, use Inode.pack to update it, and then write it back out.

On the other hand, you will probably find it easier to cache the entire directory, representing it in memory as a HashMap that maps String keys to Integer inumbers. The sync system call converts this data structure to disk format, writes it to the directory blocks on disk, and updates the superblock's dirents field.

Lazy Allocation

The structures on disk used to represent a file can accommodate fairly large files, but are designed so that small files use very little disk space. Blocks should be added to a file only when required to satisfy a write request. When a new file is created, all it needs is an inode. All the pointers in the inode should be initialized to zero (null) to indicate that no blocks are allocated. If an application writes a small amount of data at offset 0, you will allocate a block and make inode.data[0] point to it. Similarly, indirect blocks should be allocated only as needed. The first write to offset greater than or equal to 4 * BLOCK_SIZE bytes from the start in a file will require you to allocate two blocks, one for hold the data and one for an indirect block to point to it.

There's one subtlety to this "lazy allocation" principle. Suppose the first write to a file writes three bytes at offset BLOCK_SIZE:


    byte[] buf = { 1, 2, 3 };
    write(inum, BLOCK_SIZE, buf);

The definition of write above says the result should appear as if the file has length BLOCK_SIZE + 3, with the first BLOCK_SIZE bytes having a value of 0. In other words, the file should look as if the code was


    buf = new byte[BLOCK_SIZE];
    write(inum, 0, buf);
    byte[] buf = { 1, 2, 3 };
    write(inum, BLOCK_SIZE, buf);

and the file contained two blocks. However, you should not allocate two blocks in this case. Leave inode.data[0] null and only assign a block to inode.data[1]. When you process a read call, treat a null pointer "inside" the file as if it were a pointer to a block full of null bytes. For example, if the application calls read(inum, 17, buf) where buf.length == 10, you will look at inode.data[0], see that it is null, and simply set buf[i] = 0 for i = 0,...,9. On the other hand, if the application calls write(inum, 17, buf), you will have to allocate a block to "fill in the hole".

In summary, on read a null pointer is treated like a pointer to a block of zero bytes, but on write a null pointer is replaced by a pointer to a newly allocated block. This same idea applies to indirect blocks. If the first operation on a new file is a write, only allocate data and indirect blocks as necessary to perform the operation. A subsequent read may discover that inode.data[i] or inode.indirect is null. Act as if you were able to walk all the way down to a data block and the data block was filled with zero bytes.

Testing

The Shell of project 4 is replaced in this project with a program called FileTester.java, which is a command interpreter specifically designed for testing your file system. This program is meant to be specified as the "shell" to the Kernel by typing


    java Boot cacheSize FastDisk size FileTester

where size is the size of the simulated disk, in blocks, and cacheSize is any integer (it is ignored for this project). For example, you might try


    java Boot 1 FastDisk 1000 FileTester

If a (Unix) file named DISK exists in the current directory, it should be the result of any earlier run with the same size parameter. Otherwise, a new DISK will be created. The first block will be filled with zero bytes, and the rest will contain random data. In this case, you should be careful that the first command you type is format.

You can also run the program to take its commands from a script, as in


    java Boot 1 FastDisk 1000 FileTester test1.script

Input lines starting with ``/*'' or ``//'' are ignored (the latter are echoed to the output). Other lines have the format


    command [ args ]

There is one command for each of the eight system calls, plus one extra version of write called writeln, which takes its input from the lines following the command, and a few misc other commands.

There are also a few special cases.

The help command prints a list of all the commands and their arguments.
The quit command terminates the program (as does end-of-file).
write fname offset bytes pattern (where pattern is a string of non-blank characters) creates an array buf of bytes bytes and fills it with bytes created by repeating pattern times as many times as necessary and converting all the characters to bytes. It then calls int write(fname, offset, buf);.
writeln fname offset is similar, but the data is taken from the lines following the command, up to but not including a line containing only a single period ("."). The lines are terminated with newline characters ('\n') and concatenated, and size is the length of the resulting string.
read fname offset bytes reads the indicated portion of a file and displays the data.

Make sure to test every method listed above, and consider the following:

Is the correct data returned? That is, if you write to a file, you should be able to read the same data back.
Are appropriate errors produced? For example, a negative offset should cause an error. Reading or writing an fname that has not been created should be an error. There are many more errors to consider -- be thorough!
Is all the necessary data sent to disk? If you create a filesystem and some files, you should be able to "reset" the computer and continue where you left off. You can test this by creating some files in one run of the program and then reading them back in another run. When you restart the Kernel, the Disk reloads its contents from DISK, and the filesystem should be usable.

Implementation Hints

There is a FAQ (Frequently Asked Questions) page associated with this project. Please check it frequently for updates.

Although this is a large project, it should be manageable if you break it down into small pieces. Here is one way (but not the only possible way!) to decompose the problem. The tasks are listed roughly in the order they are needed, although in some cases they are inter-dependent.

System calls: Update the Kernel to accept the new system calls, but "stub them out" so that they only print a debugging message. Edit Library.java to replace the System.err.println calls with appropriate calls to Kernel.interrupt.
Free-space management.: Write methods for allocating and freeing disk blocks. Implement and test Library.format.
Data Structures.: Implement data structures for the in-memory versions of inodes and the directory.
Disk Structures: Write the methods for converting between memory format and disk format (byte[BLOCK_SIZE]). Start with int and short. Gradually add methods to convert other data structures after you write the code to manipulate them in memory. Write the code to print these data structures in the format specified for the list system call.
Directory.: Write the code to find the inumber of a file from its name. Implement the methods link and unlink. Write methods to save the directory to disk and to restore it from disk at startup.
Block access within a file.: Write code to read or write a sequence of bytes within a block on disk, given the block number and the start and length of the region. Note that you can only read or write whole blocks, so to modify part of a block, you need to read it in, modify the in-memory copy, and write it back out. Then write a method that takes an Inode and a block-offset within the file and returns the block number of the corresponding block. It should return an error if the block does not exist. You might want first to write and debug this method for "small" files (no indirect blocks). Then write code that allocates missing blocks instead of returning an error. Finally, enhance these methods to support indirect blocks.
Accessing inodes.: You will need methods to read a specific inode from disk (given its inumber) and writing back a modified inode. You will also need methods to allocate inodes and to release all the blocks of a file (including the indirect block if any) when the link code goes to zero.
Reading and writing arbitrary ranges.: At this point, implementing read and write should not be too hard. An individual read request may touch parts of several blocks, so you will need a loop that reads each of the blocks and copies the appropriate portion of it into the appropriate part of the buffer argument of the read call. The implementation of write is slightly more complicated because if a block is only partially modified, you have to read its old value, copy data from the client's buffer into the appropriate portion of the block, and then write it back out.
Test. Test. Test.

Grading

Your grade will be 80% correctness and 20% style. Don't forget the following:

Implement algorithms as described.
Make your code readable, robust, and reusable.
Test a variety of situations, including error conditions.
Consider your test results -- do they make sense?

What to hand in

Copy into your handin directory a complete set of all the .java files needed to run your program and any test scripts you created. Also include a file named README containing a line of the form Partner: login1 login2 and a brief description of your test scripts.

CS 537 - Spring 2004 Assignment 5: File Systems, Part II - Inodes and Directories