In this project you'll be adding integrity-checking to the xv6 filesystem to protect it against data corruption by computing checksums of file contents and directory entries.
In part 1 you'll rearrange the filesystem's metadata layout and modify the mkfs
command to match.
In part 2 you'll have the filesystem verify the checksums of any data and directory entries it reads from the disk.
In part 3 you'll have the filesystem compute, update, and record new checksums for files and directories created or modified at runtime.
In part 4 you'll modify the fstat
system call to add some data integrity information to the metadata it retrieves.
For bonus points, you can implement a double-indirect block to support larger files.
Start out by familiarizing yourself with the layout of the xv6 filesystem. Chapter 6 of the xv6 book is a good place to start for that. Note though that the version of xv6 we're using does not include the logging feature described in the book; you can safely ignore the parts that pertain to that.
Start your xv6 modifications by changing the filesystem's inode structure. Each 4-byte block pointer in the addrs
array should be accompanied by a 4-byte checksum of the data contained in the pointed-to block. You do not need to checksum the contents of the indirect block, so you should use a dedicated member of the inode structure to store the pointer to it alone (without a checksum) instead of storing it in the addrs
array as xv6 does in its unmodified form.
You'll need to change both the in-memory inode (struct inode
in kernel/file.h
) and the on-disk inode (struct dinode
in include/fs.h
). Make sure you do not change the size of the on-disk inode -- it should remain 64 bytes. Because you'll now be using twice as many bytes for each block pointer (4 bytes for the pointer itself and another 4 for the checksum of the block it points to), you'll need to change the definitions of the NDIRECT
and NINDIRECT
macros. You can check the size of your on-disk inode struct by compiling your code and then running:
$ gdb -batch -ex 'p sizeof(struct dinode)' kernel/kernel
(Maintaining the size of the in-memory inode, struct inode
, should follow naturally as well, but isn't as critical.)
With your inode structs suitably modified, you should then go through and make the necessary changes to the two components that care about inode internals: kernel/fs.c
and tools/mkfs.c
. (Look for functions that use the addrs
member of the inode struct.)
The checksums you'll be using will be computed using the Adler-32 algorithm. You can use it in mkfs.c
and fs.c
by adding the following code to include/fs.h
:
#define ADLER32_BASE 65521U
static inline uint adler32(void* data, uint len)
{
uint i, a = 1, b = 0;
for (i = 0; i < len; i++) {
a = (a + ((uchar*)data)[i]) % ADLER32_BASE;
b = (b + a) % ADLER32_BASE;
}
return (b << 16) | a;
}
Checksumming should be applied to each block that holds file contents or directory entries. It should be computed over a whole number of blocks, even if the file or directory data only uses part of a block (i.e. even if a file is only 10 bytes long, the checksum should still be computed over the entire block used to store those 10 bytes). You do NOT need to compute or store checksums of the contents of indirect blocks or any metadata other than directory contents (just blocks containing file data and directory entries).
Note: since at this stage the filesystem isn't actually verifying the checksums yet, you don't actually have to have mkfs
fill them in with anything. (Though you might as well; it'll just become part of part 2 if you don't.)
Once you've done all this properly, you should be able to boot and run a healthy xv6 -- things like cat README
and echo foo > newfile
in the xv6 shell should work as usual.
Now that your checksums are in place, start using them in your filesystem: when you read a block of file data or directory entries, compute the checksum of the raw data returned by the read operation and compare it against the stored checksum for that block. If the two match, everything proceeds as usual. If, however, the two are not equal, you should fail the operation (ensure that whatever system call incurred the read returns -1
) and make the following cprintf
call to print an error message to the console:
cprintf("Error: checksum mismatch, block %d\n", blocknum);
Here blocknum
must be the disk block number (not the block number within the file).
[Update: unfortunately, to actually test out that this is working out you'll need to implement part 3 first before xv6 will boot properly.]
[UPDATE: At this point, if you attempt to boot xv6 you'll probably encounter a checksum mismatch on block 29. This is due to init creating the special device node console
at runtime when it first starts, which is a modification to the root directory. If the root directory's checksum isn't updated accordingly, subsequent accesses to it to look up files (such as sh
to start the shell) will fail. If you want to temporarily hack in a special-case bypass for this specific problem to allow xv6 to boot for testing purposes (e.g. if (blocknum == 29) skip_checksum_check
) you may do so, but make sure to take it out before handing in your project.]
Now it's time to fix that: modify the filesystem so that anytime it writes a block of file contents or directory entries, it computes a new checksum of the block and stores it alongside the pointer to that block.
When this is implemented correctly, you should be able to access newly-created files and directories, so commands like the following should work in your xv6 shell:
$ mkdir newdir
$ ls newdir
. 1 21 32
.. 1 1 512
$ echo abc def > newdir/newfile
$ cat newdir/newfile
abc def
$ echo xyz > newdir/newfile
$ cat newdir/newfile
xyz
def
$ ls newdir
. 1 21 48
.. 1 1 512
newfile 2 22 8
(Note that xv6 overwrites without truncating with the >
redirection operator, so some of the old data remains in the output of the second cat
command.)
[UPDATE (formerly in part 2): To test your checksum validation, run make
to create the filesystem image file fs.img
(with correct checksums) and then manually corrupt it by using a hex editor (or other means of your choosing) to alter the contents of a block containing file contents or directory entries. You should then be able to see your checksum-validation code step in and prevent the corrupted data from being used. For example, if you alter the contents of the README
file in the image, cat README
should trigger the above error message and fail. If you alter the contents of an executable such as echo
, an attempt to run it should fail similarly.]
fstat
Finally, after adding this integrity-checking metadata to your inodes, you should extend the fstat
system call to let user programs retrieve it. To do this, add a uint
member to the stat
struct (in include/stat.h
) called checksum
. Your stat
struct should then look like this:
struct stat {
short type; // Type of file
int dev; // Device number
uint ino; // Inode number on device
short nlink; // Number of links to file
uint size; // Size of file in bytes
uint checksum;
};
When fstat
is called, it should fill in the checksum
member of the stat
struct with the XOR of the Adler-32 checksums of all the blocks in the file. (If the file is empty, the checksum
field should be set to zero.) You should test this out by writing a simple utility program called stat
that simply calls fstat
on a file passed as a command-line argument and prints all the fields in the resulting stat
struct. (Don't worry about the exact format of your stat
program's output; it's just a tool for your own use to make sure your fstat
modifications are working properly.)
[UPDATE: To test a bit of all four parts together, you should be able to successfully run the following commands (recall that cat
is short for "concatenate"):
$ cat README README > README2
$ cat README2 README2 > README4
$ stat README4
The checksum of README4
printed by your stat
program should then be 0xB48D015F
(use printf("%x", ...)
to get hexadecimal).]
The modifications made to support checksums have an unfortunate side effect: because you can now fit only half as many block pointers in your inode and indirect block, the maximum file size supported by your filesystem is cut in half.
For bonus points, you can address this downside by adding a double-indirect block. This is like the indirect block pointer in the inode (that points to a block of block pointers), but adds another level: it's a block of pointers to blocks of block pointers. So as not to reduce the number of direct blocks, you should use the last block pointer in your indirect block as a pointer to your double-indirect block (the contents of the double-indirect block do not need to be checksummed, so you may end up with 4 unused bytes at the end of the indirect block).
This should allow you to support files up to a bit over 2MB [UPDATE: 4MB, if done properly]. If you want to actually explore that range of file sizes though you'll need to tweak mkfs
a bit further and modify the xv6 Makefile to create a larger filesystem image. (Also keep in mind that creating large files in xv6 may be a pretty slow process.)
As with project 4, if you do the bonus you should:
mention it in your README, and
hand it in in a separate handin directory (xv6-bonus
).
Because of the dependent nature of the features, your xv6-bonus
code must also include all the modifications for the main part of the project (i.e. it must implement both double-indirect blocks and all the checksumming features of parts 1 through 4).
To inspect the contents of a specific block of your filesystem image, run
$ dd status=none if=fs.img bs=512 count=1 skip=BLOCKNUM | hexdump -Cv
replacing BLOCKNUM
with the (zero-based) number of the block you want.
The filesystem uses little-endian ("Intel order") byte ordering, so multi-byte fields may appear "backwards" from what you'd expect.
Running make qemu-nox
will reuse an existing fs.img
, including modifications from any writes performed during previous runs. If you want to start with a fresh filesystem image, run rm fs.img
before make qemu-nox
so that it will run mkfs
to rebuild the image before booting xv6.