Project 5b: File System Integrity
In this project, you'll be changing the existing xv6 file system to add protection from data corruption. In real storage systems, silent corruption of data is a major concern, and thus many techniques are usually put in place to detect (and recover) from blocks that go bad.
Specifically, you'll do three things. First, you'll modify the code to allow the user to create a new type of file that keeps a mirrored copy for every block it points to. As you have studied in class, RAID-1 uses mirroring to allow systems to recover from corruption.
Second, you'll have to change the file system to handle reads and writes differently for files with mirrored copies. Specifically, when writing out a mirrored file, you will have to write each block twice; when reading such a file, you will have to read both blocks, and check that they match. If the two copies don't match, return an error code (-1). In this way, your file system will be able to detect corruption!
Third, for information purposes, you will also modify the
To begin, the first thing to do is to understand how the file system is laid out on disk. It is actually quite a simple layout, much like the old Unix file system we have discussed in class. As it says at the top of fs.c , there is a superblock, then some inodes, a single block in-use bitmap, and some data blocks. That's it!
What you then need to understand is what each inode looks like. Currently, it is quite a simple inode structure (found in fs.h ), containing a file type (regular file or directory), a major/minor device type (so we know which disk to read from), a reference count (so we know how many directories the file is linked into), a size (in bytes), and then NDIRECT+1 block addresses. The first NDIRECT addresses are direct pointers to the first NDIRECT blocks of the file (if the file is that big). The last block is reserved for larger files that need an indirect block; thus, this last element of the array is the disk address of the indirect block. The indirect block itself, of course, may contain many more pointers to blocks for these larger files.
The change we suggest you make is very simple, conceptually. Your implementation should keep the inode structure as is, but to use every pair of slots for a single logical block. This limits how many disk addresses your file system can refer to (to half the previous amount), but that is probably OK for this project.
On subsequent reads of a mirrored file, the file system should read in both
copies of a block, and compare the two. If the copies match, your code should
return as usual; if not, the call to
Note also that large files will also have an indirect pointer allocated to them, and thus the indirect block will be full of direct pointers too; these direct pointers should be treated similar to the ones in the inode.
One major obstacle with any file system is how to boot and test the system with the new file system; if you just change a bunch of stuff, and make a mistake, the system won't be able to boot, as it needs to be able to read from disk in order to function (e.g., to start the shell executable).
To overcome this challenge, your file system will support two types of files: the existing pointer-based structures, and the new mirrored structures. The way to add this is to add a new file type (see stat.h for the ones that are already there like T_DIR for directories, and T_FILE for files). Let's call this type T_MIRRORED (a new file type, with the numeric value of 4). Thus, for regular files (T_FILE), you just use the existing code, without mirroring. However, when somebody allocates an file that has extra protection (T_MIRRORED), you should use your new mirroring-based code.
To create an mirroring-based file, you'll have to modify the interface to
file creation, adding an O_MIRRORED flag (value 0x400) to the
There is no need to do any of this for directories.
Of course the real challenge is getting into the file system code and
making your mirroring-based modifications. To understand how it works, you
should follow the paths for the
Finally, you'll have to modify the
Also create a new program, called filestat, which can be called like this: filestat pathname. When run in such a manner, the filestat program should print out all the information about a file, including its type, size, and all the mirrored blocks it uses. Use this to show that your mirroring-based file system works.
The code (and associated README) can be found in ~cs537-2/ta/xv6/ . Everything you need to build and run and even debug the kernel is in there, as before.
One of the drawbacks of mirroring is that if there is a mismatch between the copies, we don't know which one is correct. Resolve this problem by storing three copies. On reads, if there is a mismatch, resolve the mismatch by figuring out the majority version (2 copies have value X, and 1 copy has value Y), and correcting the minority version (Y) to the majority version (X). Return an error if all three copies have different versions.