Recent Changes - Search:

Meeting Signup

Sections

Projects

edit SideBar

P3

Project 3: File systems

Due Date: Tuesday, April 14 Thursday, April 16 at 9 pm.

Updates

  • April 16th: You only need to turn in one copy of the project per group.
  • April 15th: This is in response to a question. You do not need to copy/save the contents of free blocks when you create the defragmented disk. They can be left as they are, initialized to zeros, or anything else as long as the free list itself is well constructed (i.e., it is a valid linked list).
  • April 12th. There is no specification under part 1 of what to turn in. You should turn in your code and a README file explaining any particular details of the code. We will grade your code for correctness and for a minimum of programming quality (such as checking for errors). If your code does not work, then it is in your best interest to make it easy to read so we can try to give you partial credit.
  • April 6th. For part 1, the inodes in the inode region are all contiguous: the whole region is one big array, independent of block boundaries. So it is possible for there to be an inode that overlaps two blocks.

For the second part, inode blocks are distributed, so there will be no inodes that overlap two blocks.

  • For part 1, the assignment did not previously say how to pass the name of the disk data file to your program. Your program should take one parameter: the name of the file.
  • You will need to do binary file I/O to read in the datafiles. You can do this with the fread() C library function. Here is some sample code:
   FILE * f;
   unsigned char * buffer;
   size_t bytes;
   buffer = malloc(10*1024*1024);
   f = fopen("datafile-defrag","r");
   bytes =  fread(buffer,10*1024*1024,1,f);

Please do not use this directly - it does no error checking. If you want to find out how big a file is from within your program, you can use the fstat function.

  • Friday, April 3. The project did not until today specify the names of the executables. The name for the disk defragmenter should be defrag and the name of the analyzer for part 2 should be analyzer. Your make file should build both.
  • To simplify the project, I will make sure that the disk fragment you get for part 2 starts at block zero, rather than from somewhere in the middle of the disk.

Goals

  • To get experience with file system structure

Part 1

File system defragmenters can improve performance by laying out all the blocks of a file in sequential order on disk. For this assignment, you will write a disk defragmenter for the Unix file system.

Data structures

There are two data structures stored on disk: the superblock and the inode.

  struct superblock {
    int blocksize; /* size of blocks in bytes */
    int inode_offset; /* offset of inode region in blocks */
    int data_offset; /* data region offset in blocks */
    int swap_offset; /* swap region offset in blocks */
    int free_inode; /* head of free inode list */
    int free_block; /* head of free block list */
  }

On disk, the first 512 bytes contains the boot block, and you can ignore it. The second 512 bytes contains the superblock. All offsets in the superblock start 1024 bytes in and are given as blocks. Thus, if the inode offset is 1 and the block size is 1 KB, the inode region starts at 1024 + 1*1KB = 2KB into the disk. Each region fills up the disk up to the next region; the swap region fills the disk to the end.

The inode structure is given below:

  #define N_DBLOCKS 10 
  #define N_IBLOCKS 4  
  struct inode {  
      int next_inode; /* list for free inodes */  
      int protect;        /*  protection field */ 
      int nlink;  /* Number of links to this file */ 
      int size;  /* Number of bytes in file */   
      int uid;   /* Owner's user ID */  
      int gid;   /* Owner's group ID */  
      int ctime;  /* Time field */  
      int mtime;  /* Time field */  
      int atime;  /* Time field */  
      int dblocks[N_DBLOCKS];   /* Pointers to data blocks */  
      int iblocks[N_IBLOCKS];   /* Pointers to indirect blocks */  
      int i2block;     /* Pointer to doubly indirect block */  
      int i3block;     /* Pointer to triply indirect block */  
   };

The inode region is effectively a large array of inodes. An unused inode has zero in the nlink field and the next_inode field contains the index of the next free inode. For inodes in use, the next_inode field is not used.

The size field of the inode is used to determine which data block pointers are valid. If the file is small enough to fit in N_DBLOCKS blocks, then the indirect blocks are not used. Note that there may be more than one indirect block. However, there is only one pointer to a double indirect block and one pointer to a triple indirect block. All block pointers are relative to the start of the data region.

The free block list is maintained as a linked list. The first four bytes of a free block contain an integer index to the next free block; the last free block contains -1 for the index.

Part 1 specification

You will be given a disk image containing a file system. It will be correct (no corruption), and the free list of the superblock will list all the free inodes and free data blocks.

You should read in the disk, find inodes containing valid files, and write out a new image containing the same set of files, with the same inode numbers, but with all the blocks in a file layed out contiguously. Thus, if a file originally contained blocks {6,2,15,22,84,7} and it was reloacted to the beginning of the data section, the new blocks would be {0,1,2,3,4,5}.

After defragmenting, your new disk image should contain the same boot block (just copy it), a new superblock with the same list of free inodes but a new list of free blocks sorted from lowest to highest (to prevent future fragmentation), new inodes for the file, and data blocks at their new locations.

The output disk images should be named "datafile-defrag".

A sample disk image for you to work with is available on the web at http://www.cs.wisc.edu/~cs537-1/projects/p3/datafile-frag in AFS at ~cs537-1/projects/p3/datafile-frag.

Part 2: forensics

Now that you know how to read and write file systems, you will use apply this skill forensically to solve a crime.

On March 18, 1990, the Isabella Stewart Gardner Museum in Boston, MA, was robbed by two unknown white males dressed in police uniforms and identifying themselves as Boston police officers. The unknown subjects gained entrance into the museum by advising on-duty security personnel that they were responding to a call of a disturbance within the compound. Security, contrary to museum regulations, allowed the unknown subjects into the facility. Upon gaining entry, the two unknown subjects abducted the on duty security personnel, securing both guards with duct tape and handcuffs in separate remote areas of the museum's basement. The unknown subjects brandished no weapons, nor were any weapons seen during this heist. Other than a "panic" button located behind the guards' watch desk area, the museum alarm system was internally only. Since the panic button was not activated, no actual police notification was made during the robbery. The video surveillance film was seized by the unknown subjects prior to their departure. While in the museum from the hours of 1:24 a.m. to 2:45 a.m., the unknown subjects seized many works of art, the values of which have been estimated as high as $300 million.

Just last week, during a routine traffic stop, police identified a possible subject. The subject's motor vehicle contained, among other items, a hard drive (without the laptop). Although the subject had apparently attempted to delete all files on drive, the subject was naive (or in a rush) and only deleted the files rather than overwriting or reformatting the disk. This means that most data and indeed most of the file control blocks still reside on disk.

You are part of the forensics team attempting to reconstruct the disk's contents. You have been given a region of the disk to reconstruct. So far other members of your team have determined that the file system was on a little- Endian machine running some form of UNIX with an inode structure:

  #define N_DBLOCKS 10 
  #define N_IBLOCKS 4  
  struct inode {  
      int unknown; /* Unknown field */  
      int protect;        /*  protection field */ 
      int nlink;  /* Number of links to this file */ 
      int size;  /* Number of bytes in file */   
      int uid;   /* Owner's user ID */  
      int gid;   /* Owner's group ID */  
      int ctime;  /* Time field */  
      int mtime;  /* Time field */  
      int atime;  /* Time field */  
      int dblocks[N_DBLOCKS];   /* Pointers to data blocks */  
      int iblocks[N_IBLOCKS];   /* Pointers to indirect blocks */  
      int i2block;     /* Pointer to doubly indirect block */  
      int i3block;     /* Pointer to triply indirect block */  
   };

a block size of 1024 bytes; and the owner's UID and GID appear to be 18390 and 9921 respectively (these are in decimal format).

The data file containing your assigned region of the disk appears at http://www.cs.wisc.edu/~cs537-1/projects/p3/datafile-mystery in AFS at ~cs537-1/projects/p3/datafile-mystery. It should be 10485760 bytes after you download it. Be sure to download it as a binary file.

Note that for this file system, there is not a separate region of the disk for inodes. Instead, they can be stored in any block on the disk. However, the whole block will contain inodes (no mixed inodes and data).

Part 2 Specification

Your assignment is as follows:

  • Reconstruct any files that can be found in your assigned disk region.
  • Produce a list of any data blocks (numbered from zero) that are not used by the above files; these will be needed by other teams reconstructing other regions of the disk.
  • Identify the perpetrator if possible and explain why you suspect him or her.
  • Answer the following questions:
    1. Describe your algorithm for solving this problem. Note: you must provide a description of TWO algorithms:
      1. The steps by which you (the human) solved the problem (i.e., how did you construct the program?);
      2. The steps your finished program takes to solve the problem.
    2. What is the complexity of your algorithm (e.g., O(n)) in terms of the number of inodes? Number of data blocks?
    3. What files did you find? Provide a brief description of each (file format and, if known, what the contents represent) Hint: look for "magic numbers" (http://www.garykessler.net/library/file_sigs.html).
    4. What blocks were unidentified? Provide a list of these blocks.
    5. Which files, if any, use the indirect block? Doubly indirect? Triply indirect?
    6. If the inodes were not included in the data file, could these files still be reconstructed? Why or why not? If the inodes existed somewhere but the uid and gid were not known, could these files still be reconstructed? Why or why not?

What to turn in

Turn in your code and writeup to the directory ~cs537-1/handin/yourname/p3 by the specified due date for the code.

Grading Policy

To be determined ...

Edit - History - Print - Recent Changes - Search
Page last modified on April 16, 2009, at 11:32 AM