lightswitch

In which sums are checked, and checks are summed.

In this project you'll be adding integrity-checking to the xv6 filesystem to protect it against data corruption by computing checksums of file contents and directory entries.

In part 1 you'll rearrange the filesystem's metadata layout and modify the mkfs command to match.

In part 2 you'll have the filesystem verify the checksums of any data and directory entries it reads from the disk.

In part 3 you'll have the filesystem compute, update, and record new checksums for files and directories created or modified at runtime.

In part 4 you'll modify the fstat system call to add some data integrity information to the metadata it retrieves.

For bonus points, you can implement a double-indirect block to support larger files.


Important Notes

For this project you may work in pairs! See this page for details.

See also this for how to set permissions to collaborate with your partner.

Due: Monday, May 1 at 11:59 PM (Late policy)

By now you should be able to figure out the handin procedure on your own.


Start from this version of xv6!

(It has a fix for an xv6 bug that may be relevant.)


Inode-frobbing

Start out by familiarizing yourself with the layout of the xv6 filesystem. Chapter 6 of the xv6 book is a good place to start for that. Note though that the version of xv6 we're using does not include the logging feature described in the book; you can safely ignore the parts that pertain to that.

Start your xv6 modifications by changing the filesystem's inode structure. Each 4-byte block pointer in the addrs array should be accompanied by a 4-byte checksum of the data contained in the pointed-to block. You do not need to checksum the contents of the indirect block, so you should use a dedicated member of the inode structure to store the pointer to it alone (without a checksum) instead of storing it in the addrs array as xv6 does in its unmodified form.

You'll need to change both the in-memory inode (struct inode in kernel/file.h) and the on-disk inode (struct dinode in include/fs.h). Make sure you do not change the size of the on-disk inode -- it should remain 64 bytes. Because you'll now be using twice as many bytes for each block pointer (4 bytes for the pointer itself and another 4 for the checksum of the block it points to), you'll need to change the definitions of the NDIRECT and NINDIRECT macros. You can check the size of your on-disk inode struct by compiling your code and then running:

$ gdb -batch -ex 'p sizeof(struct dinode)' kernel/kernel

(Maintaining the size of the in-memory inode, struct inode, should follow naturally as well, but isn't as critical.)

With your inode structs suitably modified, you should then go through and make the necessary changes to the two components that care about inode internals: kernel/fs.c and tools/mkfs.c. (Look for functions that use the addrs member of the inode struct.)

The checksums you'll be using will be computed using the Adler-32 algorithm. You can use it in mkfs.c and fs.c by adding the following code to include/fs.h:

#define ADLER32_BASE 65521U

static inline uint adler32(void* data, uint len)
{
  uint i, a = 1, b = 0;

  for (i = 0; i < len; i++) {
    a = (a + ((uchar*)data)[i]) % ADLER32_BASE;
    b = (b + a) % ADLER32_BASE;
  }

  return (b << 16) | a;
}

Checksumming should be applied to each block that holds file contents or directory entries. It should be computed over a whole number of blocks, even if the file or directory data only uses part of a block (i.e. even if a file is only 10 bytes long, the checksum should still be computed over the entire block used to store those 10 bytes). You do NOT need to compute or store checksums of the contents of indirect blocks or any metadata other than directory contents (just blocks containing file data and directory entries).

Note: since at this stage the filesystem isn't actually verifying the checksums yet, you don't actually have to have mkfs fill them in with anything. (Though you might as well; it'll just become part of part 2 if you don't.)

Once you've done all this properly, you should be able to boot and run a healthy xv6 -- things like cat README and echo foo > newfile in the xv6 shell should work as usual.


Sum-checking

Now that your checksums are in place, start using them in your filesystem: when you read a block of file data or directory entries, compute the checksum of the raw data returned by the read operation and compare it against the stored checksum for that block. If the two match, everything proceeds as usual. If, however, the two are not equal, you should fail the operation (ensure that whatever system call incurred the read returns -1) and make the following cprintf call to print an error message to the console:

cprintf("Error: checksum mismatch, block %d\n", blocknum);

Here blocknum must be the disk block number (not the block number within the file).

[Update: unfortunately, to actually test out that this is working out you'll need to implement part 3 first before xv6 will boot properly.]


Check-summing

At this point, your system should mostly work fine, but any newly-created or modified files and directories should be inaccessible (because they won't match their stored checksums).

[UPDATE: At this point, if you attempt to boot xv6 you'll probably encounter a checksum mismatch on block 29. This is due to init creating the special device node console at runtime when it first starts, which is a modification to the root directory. If the root directory's checksum isn't updated accordingly, subsequent accesses to it to look up files (such as sh to start the shell) will fail. If you want to temporarily hack in a special-case bypass for this specific problem to allow xv6 to boot for testing purposes (e.g. if (blocknum == 29) skip_checksum_check) you may do so, but make sure to take it out before handing in your project.]

Now it's time to fix that: modify the filesystem so that anytime it writes a block of file contents or directory entries, it computes a new checksum of the block and stores it alongside the pointer to that block.

When this is implemented correctly, you should be able to access newly-created files and directories, so commands like the following should work in your xv6 shell:

$ mkdir newdir
$ ls newdir
.              1 21 32
..             1 1 512
$ echo abc def > newdir/newfile
$ cat newdir/newfile
abc def
$ echo xyz > newdir/newfile
$ cat newdir/newfile
xyz
def
$ ls newdir
.              1 21 48
..             1 1 512
newfile        2 22 8

(Note that xv6 overwrites without truncating with the > redirection operator, so some of the old data remains in the output of the second cat command.)

[UPDATE (formerly in part 2): To test your checksum validation, run make to create the filesystem image file fs.img (with correct checksums) and then manually corrupt it by using a hex editor (or other means of your choosing) to alter the contents of a block containing file contents or directory entries. You should then be able to see your checksum-validation code step in and prevent the corrupted data from being used. For example, if you alter the contents of the README file in the image, cat README should trigger the above error message and fail. If you alter the contents of an executable such as echo, an attempt to run it should fail similarly.]


Checksums via fstat

Finally, after adding this integrity-checking metadata to your inodes, you should extend the fstat system call to let user programs retrieve it. To do this, add a uint member to the stat struct (in include/stat.h) called checksum. Your stat struct should then look like this:

struct stat {
  short type;  // Type of file
  int dev;     // Device number
  uint ino;    // Inode number on device
  short nlink; // Number of links to file
  uint size;   // Size of file in bytes
  uint checksum;
};

When fstat is called, it should fill in the checksum member of the stat struct with the XOR of the Adler-32 checksums of all the blocks in the file. (If the file is empty, the checksum field should be set to zero.) You should test this out by writing a simple utility program called stat that simply calls fstat on a file passed as a command-line argument and prints all the fields in the resulting stat struct. (Don't worry about the exact format of your stat program's output; it's just a tool for your own use to make sure your fstat modifications are working properly.)

[UPDATE: To test a bit of all four parts together, you should be able to successfully run the following commands (recall that cat is short for "concatenate"):

    $ cat README README > README2
    $ cat README2 README2 > README4
    $ stat README4

The checksum of README4 printed by your stat program should then be 0xB48D015F (use printf("%x", ...) to get hexadecimal).]


Bonus: Double-indirect blocks

The modifications made to support checksums have an unfortunate side effect: because you can now fit only half as many block pointers in your inode and indirect block, the maximum file size supported by your filesystem is cut in half.

For bonus points, you can address this downside by adding a double-indirect block. This is like the indirect block pointer in the inode (that points to a block of block pointers), but adds another level: it's a block of pointers to blocks of block pointers. So as not to reduce the number of direct blocks, you should use the last block pointer in your indirect block as a pointer to your double-indirect block (the contents of the double-indirect block do not need to be checksummed, so you may end up with 4 unused bytes at the end of the indirect block).

This should allow you to support files up to a bit over 2MB [UPDATE: 4MB, if done properly]. If you want to actually explore that range of file sizes though you'll need to tweak mkfs a bit further and modify the xv6 Makefile to create a larger filesystem image. (Also keep in mind that creating large files in xv6 may be a pretty slow process.)

As with project 4, if you do the bonus you should:

  1. mention it in your README, and

  2. hand it in in a separate handin directory (xv6-bonus).

Because of the dependent nature of the features, your xv6-bonus code must also include all the modifications for the main part of the project (i.e. it must implement both double-indirect blocks and all the checksumming features of parts 1 through 4).


Tips, notes, things to keep in mind