Project 7: An Extent-based File System

Important Dates

Questions about the project? Send them to 537-help@cs.wisc.edu .

Due: Sunday, 5/08, by whenever (however, if you really want to, you can have up until 05/15)

Notes

This project can be done in either by yourself or in a team of size two.

Overview

In this project, you'll be changing the existing xv6 file system to use extents rather than pointers. An extent is a pointer-length pair; the pointer is just a disk sector address, and the length is how many consecutive blocks, starting at that address, are part of this extent.

A lot of modern file systems use extents because of their compactness, and for the simple fact that file systems like to consecutively allocate blocks on disk anyhow for performance reasons.

Specifically, you'll do three things. First, you'll have to decide exactly on what your inode structure will be. Some recommendations are found below.

Second, you'll have to change the file system implementation to allocate new files (and possibly directories )in a manner that uses extents, not just pointers. Note however that you will maintain the old pointer-based code for backwards compatabillity and simplicity (details below).

Finally, for information purposes (and for great demos), you will also modify the stat system call to dump information about each (pointer, length) pair. Thus, you should write a little program that, given a file name, not only prints out the file's size, etc., but also the address and lengths of each extent, if the file is an extent-based file. We'll use this program, in particular, to do a final demo of this project.

Details

To begin, the first thing to do is to understand how the file system is laid out on disk. It is actually quite a simple layout, much like the old Unix file system we have discussed in class. As it says at the top of fs.c , there is a superblock, then some inodes, a single block in-use bitmap, and some data blocks. That's it!

What you then need to understand is what each inode looks like. Currently, it is quite a simple inode structure (found in fs.h ), containing a file type (regular file or directory), a major/minor device type (so we know which disk to read from), a reference count (so we know how many directories the file is linked into), a size (in bytes), and then NDIRECT+1 block addresses. The first NDIRECT addresses are direct pointers to the first NDIRECT blocks of the file (if the file is that big). The last block is reserved for larger files that need an indirect block; thus, this last element of the array is the disk address of the indirect block. The indirect block itself, of course, may contain many more pointers to blocks for these larger files.

The change we suggest you make is very simple, conceptually. One simple implementation will be to keep the inode structure exactly as is, but to use the slots for direct pointers as (pointer, length) pairs. Specifically, use each direct pointer slot (which are 4 bytes) as 3 bytes of pointer, and 1 byte of length. This limits the size of each extent (to 2^8) as well as how many disk addresses one can refer to (to 2^24), but that is probably OK for this project.

It is also OK not to have an indirect block for extent-based files, just the NDIRECT (pointer, length) pairs.

One major obstacle with any file system is how to boot and test the system with the new file system; if you just change a bunch of stuff, and make a mistake, the system won't be able to boot, as it needs to be able to read from disk in order to function (e.g., to start the shell executable).

To overcome this, your file system will support two types of files: the existing pointer-based structures, and the new extent-based structures. The way to add this is to add a new file type (see stat.h for the ones that are already there like T_DIR for directories, and T_FILE for files). Let's call this type T_EXTENT. Thus, for regular files (T_FILE), you just use the existing pointer-based code; however, when somebody allocates an extent-based file (T_EXTENT), you should use your new extent-based code.

To create an extent based file, you'll have to modify the interface to file creation, perhaps adding an O_EXTENT flag to the open() system call which normally creates files. When such a flag is present, your file system should thus create an extent-based file, with all of the expected changes as described above. Of course, various routines deeper in the file system would have to be modified in order to be passed the new flag and use it accordingly.

There is no need to do this for directories, but if you want to, you'll have to change mkdir() too to pass in a flag to make the directory extent-based as well.

Once you have all of this in mind, you'll have to start changing some code. If you follow the guidelines above carefully (which provide backwards compatibility for the old file system type), you really won't have to change mkfs, though you might be curious what it does, which is write an empty file system to the image file fs.img ; xv6 then uses this as its root directory. Beyond a root directory, the mkfs tool also puts all of the binaries (like ls, cat, etc.) into the file system image, which are then available in the file system when you boot xv6. In doing all of this, mkfs allocates inodes and data blocks, and updates the root directory as you would expect. If you really wanted to, you could change it to allocate some extent-based files, but there is no need to.

Of course the real challenge is getting into the file system code and making your extent-based modifications. To understand how it works, you should follow the paths for the read() and write() system calls. They are not too complex, and will eventually lead you to what you are looking for. Hint: at some point, you should probably be staring pretty hard at routines like bmap() .

Finally, you'll have to modify the stat() system call to return information about the actual disk addresses in the inode (stat() currently doesn't return such information). Also create a new program, called (confusingly) stat , which can be called like this: stat pathname . When run in such a manner, the stat program should print out all the information about a file, including its type, size, and information about its extents (or direct pointers, if it's a pointer-based file). Use this to show that your extent-based file system makes extents!

The Code

The code (and associated README) can be found in ~cs537-1/ta/xv6/ . Everything you need to build and run and even debug the kernel is in there, as before.

You may also find the following readings about xv6 useful, written by the same team that ported xv6 to x86: chapters 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 .

All of these can be found off the xv6 site .

Particularly useful for this project: Chapters 7, 8, maybe 6.

Testing

Grading for this project will be easy: you will run a demo for us of your new file system, showing off that it is using extents, and not just pointers. Sign up for demo times next week. More details will be given in class.

Handing It In

Even though we are running a demo, please use the p7 directory for your handin, and put your code in there before your demo. If working as a team of two, please handin the material in ONE directory, with a README that clearly indicates the names and CS logins of the team members. Also, create a soft link to the handin directory of the other team member with the name partner ; for example, if the two partners are named joe and jane, and you turn in the project in jane's p5 directory, you should go into jane's p5 directory and type ln -s ~cs537-1/handin/joe/p5 partner to create the soft link.

Turn in all files that you have changed or added (.c and .h files, and possibly a modified Makefile). Thus, we should be able to take your handed in files, add the rest of the source code, and build and run your kernel and any tests you have turned in.

No write-up is required for this project.