Project 5: An Extent-based File System

Important Dates

Questions about the project? Send them to 537-help@cs.wisc.edu .

Due: Wed, 11/30, by whenever

Notes

Slight addition: You may also want to track one more thing in each inode entry: the logical offset of the file to which this extent refers. Specifically, if the extent points to block 500 on disk, and is of length 3, you might also want to record which part of the file it refers to. If, for example, this extent reflected the first three blocks of the file, then you would add the logical offset of 0 to this extent. Having this logical extent makes it easier to properly support lseek(), which allows you to write files out with holes in them (e.g., write() a block, lseek() over a block, and then write() another block).

Overview

In this project, you'll be changing the existing xv6 file system to use extents rather than pointers. An extent is a pointer-length pair; the pointer is just a disk sector address, and the length is how many consecutive blocks, starting at that address, are part of this extent.

A lot of modern file systems use extents because of their compactness, and for the simple fact that file systems like to consecutively allocate blocks on disk anyhow for performance reasons.

Specifically, you'll do four things. First, you'll have to decide exactly on what your inode structure will be. Some recommendations are found below.

Second, you'll have to change the file system implementation to allocate new files (and possibly directories )in a manner that uses extents, not just pointers. Note however that you will maintain the old pointer-based code for backwards compatabillity and simplicity (details below).

Third, for information purposes (and for great demos), you will also modify the stat system call to dump information about each (pointer, length) pair. Thus, you should write a little program that, given a file name, not only prints out the file's size, etc., but also the address and lengths of each extent, if the file is an extent-based file. We'll use this program, in particular, to do a final demo of this project.

Fourth, you will add a new system call to the file system, lseek() . Our version of lseek takes just two arguments: a file descriptor and an offset in bytes (e.g., lseek(int fd, int offset) ). It should set the current offset of the open file referred to by fd to the specified offset; upon success, it should return the offset; upon failure, it should return -1. Application programs can use this to read or write arbitrary locations within a file.

Details

To begin, the first thing to do is to understand how the file system is laid out on disk. It is actually quite a simple layout, much like the old Unix file system we have discussed in class. As it says at the top of fs.c , there is a superblock, then some inodes, a single block in-use bitmap, and some data blocks. That's it!

What you then need to understand is what each inode looks like. Currently, it is quite a simple inode structure (found in fs.h ), containing a file type (regular file or directory), a major/minor device type (so we know which disk to read from), a reference count (so we know how many directories the file is linked into), a size (in bytes), and then NDIRECT+1 block addresses. The first NDIRECT addresses are direct pointers to the first NDIRECT blocks of the file (if the file is that big). The last block is reserved for larger files that need an indirect block; thus, this last element of the array is the disk address of the indirect block. The indirect block itself, of course, may contain many more pointers to blocks for these larger files.

The change we suggest you make is very simple, conceptually. One simple implementation will be to keep the inode structure exactly as is, but to use the slots for direct pointers as (pointer, length) pairs. Specifically, use each direct pointer slot (which are 4 bytes) as 3 bytes of pointer, and 1 byte of length. This limits the size of each extent (to 2^8) as well as how many disk addresses one can refer to (to 2^24), but that is probably OK for this project.

It is also OK not to have an indirect block for extent-based files, just the NDIRECT (pointer, length) pairs.

One major obstacle with any file system is how to boot and test the system with the new file system; if you just change a bunch of stuff, and make a mistake, the system won't be able to boot, as it needs to be able to read from disk in order to function (e.g., to start the shell executable).

To overcome this, your file system will support two types of files: the existing pointer-based structures, and the new extent-based structures. The way to add this is to add a new file type (see stat.h for the ones that are already there like T_DIR for directories, and T_FILE for files). Let's call this type T_EXTENT. Thus, for regular files (T_FILE), you just use the existing pointer-based code; however, when somebody allocates an extent-based file (T_EXTENT), you should use your new extent-based code.

To create an extent based file, you'll have to modify the interface to file creation, perhaps adding an O_EXTENT flag to the open() system call which normally creates files. When such a flag is present, your file system should thus create an extent-based file, with all of the expected changes as described above. Of course, various routines deeper in the file system would have to be modified in order to be passed the new flag and use it accordingly.

There is no need to do this for directories, but if you want to, you'll have to change mkdir() too to pass in a flag to make the directory extent-based as well.

Once you have all of this in mind, you'll have to start changing some code. If you follow the guidelines above carefully (which provide backwards compatibility for the old file system type), you really won't have to change mkfs, though you might be curious what it does, which is write an empty file system to the image file fs.img ; xv6 then uses this as its root directory. Beyond a root directory, the mkfs tool also puts all of the binaries (like ls, cat, etc.) into the file system image, which are then available in the file system when you boot xv6. In doing all of this, mkfs allocates inodes and data blocks, and updates the root directory as you would expect. If you really wanted to, you could change it to allocate some extent-based files, but there is no need to.

Of course the real challenge is getting into the file system code and making your extent-based modifications. To understand how it works, you should follow the paths for the read() and write() system calls. They are not too complex, and will eventually lead you to what you are looking for. Hint: at some point, you should probably be staring pretty hard at routines like bmap() .

Finally, you'll have to modify the stat() system call to return information about the actual disk addresses in the inode (stat() currently doesn't return such information). Also create a new program, called (confusingly) stat , which can be called like this: stat pathname . When run in such a manner, the stat program should print out all the information about a file, including its type, size, and information about its extents (or direct pointers, if it's a pointer-based file). Use this to show that your extent-based file system makes extents!

The Code

The code (and associated README) can be found in ~cs537-1/ta/xv6/ . Everything you need to build and run and even debug the kernel is in there, as before.

You may also find the following readings about xv6 useful, as always, particularly anything about the file system.

Testing

More details will be available soon about grading and testing.

Handing It In

Please use the xv6 directory in p5 to turn in your code. If working as a team of two, please handin the material in ONE directory, with a README that clearly indicates the names and CS logins of the team members.

Turn in everything needed to build the kernel and associated files; no .o files or executables though, please!

No write-up is required for this project.