Mini Project for CS 736

    Due on Feb. 22nd, 11:59pm




      CS 736 Page



    Overview

      In this assignment, you will practice your skill in (simple) system programming, performance evaluation, and writing.

      What you are going to do:

      1. You have to pick a platform to study. Any Unix-based system (such as a PC running Linux) is acceptable. For this assignment, please do not use Windows-based systems.
      2. You are going to run some simple experiments, which you design to bring out various properties of the file system under test:
        • block size
        • prefetching size
        • buffer-cache size
        • # of direct pointers in the i-node.
      3. You will write up what you did, explaining your design, using some graphs to demonstrate the file system properties you got.

      You can have one partner in this project. This mini project will help develop design, measurement, and writing skills for later use in your final project, so do a good job here and it will pay off down the road.

      The deadline is the mid-night on Feb. 22nd (submit to me shanlu@...). This is a firm deadline. No extension.

    More Details

      In this assignment, we're going to explore the inner-workings of the (Unix-style) file system. Our main approach is to write little code snippets that exercise the file system in different ways (likely, you will use system calls like open(), close(), read(), write(), lseek(), fsync(), etc.); then, by measuring how long various operations take, we are going to try to make some deductions about what the file system is doing.

      Step 0: Selecting the Platform

      Pick a platform you will work upon. Very likely it will be something like a PC running Linux, but please feel free to be adventurous, e.g., FreeBSD, some ugly old Unix system like AIX, or even Mac OS X (Please keep in mind that some platforms might make your measurement harder than others). However, please do use a Unix-based system ( not Windows).

      Do a little research on the file system before you start your experiment design! Some systems, such as MacOS HFS and Linux XFS, use extents in the inode instead of blocks, making it almost impossible to measure block-size. The file system layout may determine some of your experiments.

      To measure information about the cache, you will need to be able to control what is in the cache and what is not. There are several ways of accomplishing this, both with and without root privilege.

      Step 1: Trying the Timer

      The accuracy and granularity of the timer you use will often have a large effect on your measurements. Therefore, you should use the best timer available (well, actually, you can decide by yourself). Fortunately, on x86 platforms, a highly accurate cycle counter is available through rdtsc instruction.

      Hence, the first thing you should do is: figure out how to use rdtsc or its analogue on other platforms (you can google to find out more). Once you know how to call it and get a cycle count, convert the result to seconds and measure how long something takes (e.g., a program that calls sleep(10) and exits should run for about 10 seconds). Confirm your results make sense by comparing it to a less accurate but reliable counter such as gettimeofday. Note that confirmation of timer accuracy is hugely important! If you don't trust your timer, how can you trust the results of your measurements?

      Note: rdtsc may not be the best choice on every platforms. Actually, you might find rdtsc very inaccurate on some platforms. In that case, you should choose other ways to measure time, e.g., gettimeofday. Anyway, let your experiments tell.

      Step 2: Measuring the File System

      After getting our timer in order, we will move on and measure some aspects of the file system property. All measurements should be done on the local disk of some machine - do not measure the performance of a distributed file system such as AFS, where, for example, your CS account resides. If you aren't using your own machine, you might consider the Crash and Burn lab or just one of the other computer labs in the building. Through experiments that you design, implement, run, and measure, you are to answer the following questions:

      • How big is the block size used by the file system to read data? Hint: use reads of varying sizes and plot the time it takes to do such reads. Also, be wary of prefetching effects that often kick in during sequential reads.
      • During a sequential read of a large file, how much data is prefetched by the file system? Hint: time each read and plot the time per read.
      • How big is the file cache? Hint: Repeated reads to a group of blocks that fit in cache will be very fast; repeated reads to a group of blocks that don't fit in cache will be slow.
      • How many direct pointers are in the inode? Hint: think about using write() and fsync() to answer this question. Also think about what happens when you extend a file and suddenly an indirect pointer must be allocated -- how many more writes occur at that point?

      In your write-up, you should have one or more graphs which you use to directly answer the questions above.

      Note: A major issue with any measurement is: how convincing are your numbers? In general, you need to be critical to yourself and try your best to make the experiment more solid. Specifically, you need to use repetition and average to increase your (and my) confidence on your data. That is, you should take multiple measurements of an event and compute (for example) an average over many runs. You may also need to pay attention to experimental noises from time to time.

      Step 3: Writing It Up

      After you are done with experiments, you'll need to write up what you've done. Please include following sections in your report (a short paper :)).

      • Title
      • Author: Right under the title, this says who you are.
      • Abstract: This is the paper in brief and should state the basic contents and conclusions of the paper. The abstract is not the introduction to the paper (it should be shorter), but is a summary of everything. Read some of the abstracts of papers we've read for class to get a better idea. In general, the abstract is an advertisement that should draw the reader into your paper, without being misleading. It should be complete enough to understand what will be covered in the paper.
      • Intro: A short overview of what you did, and what you learned. More motivation than the abstract, and more details. Again, make sure you include your main conclusions.
      • Methodology: How you measured what you measured. Include something about your timer accuracy here, as well as a description of the platform you are using to the level of detail such that someone else could reproduce the experiment elsewhere.
      • Results: This section should consist mainly of graphs, addressing each of the questions above. Make sure that graphs have axes labeled (including units). Also make sure to include the code snippets with each graph (or some rough description of them) so we have an idea what exactly you measured. Also, make sure to draw appropriate conclusions about each graph.
      • Conclusions: Summarize your conclusions here, and talk about what else you have learned in the process.

      Requirements and Grading:

      This paper should be at most 6 pages long (including everything), in 10 point or larger font, in double column format.
      In your write-up, you should not re-describe the assignment.
      Your paper must be written using proper English grammar and should have no spelling mistakes.

      Grading:

      The paper will be graded as follows:
      • Presentation: 1/3 How well written and structured is the paper? Are the figures and tables legible?
      • Methodology: 1/3 Is the methodology sound? Will it accurately measure and return the correct results? Does the reader have confidence in your results?
      • Explanation: 1/3 Do you explain your results completely? Are all features of your results graph explained?

      Note: I strongly recommend you write your report/paper in Latex (If you really love Word, that is o..k..). Please see more information on writing here.

      Step 4: Turn it in

      Please email me (shanlu at cs wisc edu) your paper by midnight on the project due date (Feb. 22nd).

      Final Notes

      Computer systems are complicated. You probably won't have fun in the project unless you start early.

    Acknowledgment: Thank Andrea A.-D., Remzi A.-D., Mike Swift for providing the original version of this mini-project.