Project 2: Basic Distributed System

Important Dates

Due: Friday 10/20.

Overview

The project introduces you to the fundamentals of simple distributed systems. It is to be done in groups of 2 or 3 (we suggest you do not do it alone). The basic idea is to build a simple distributed file system. Details below!

Details

Overview: A Basic NFS-Like Distributed File System

You'll be building a simple distributed file system. The file system will assume an NFS-like protocol; read about NFS here for details. Upon file open, the client will resolve the path one piece at a time in order to obtain a file handle for the file. Subsequent read and write requests will use the file handle to perform reads and writes to the server. The most basic version of your NFS system will not cache file data at the client; however, there is one optimization you will implement as described below.

Your file system will integrate into the Linux client side using FUSE . FUSE allows you to make something that looks like a file system but actually is built in user space.

The server starts up with a directory pathname; this path is where the files in this file system reside (i.e., the persistent state of the file system is stored). You will have to figure out how to store files in the server so as to be able to respond to various NFS protocol requests such as LOOKUP, READ, and WRITE.

You should also support basic file system functionality, e.g., the ability to make/delete directories and the ability to create/delete files. No support for links of any kind is needed. You also don't need to worry about security or even the notion of multiple user IDs (for simplicity). However, you should be able to have multiple clients running and accessing the file system and even accessing the same files in a way that is consistent with NFS.

The server should of course store files persistently. Exactly how you do that is up to you; think about designs that are both easy to build but also can deliver high performance.

You can use either your own communication package or gRPC or Thrift. Think about which you should use and be able to justify your decision.

Batching Writes Optimization

One problem with classic NFS v2 WRITE protocol requests is that they must be performed synchronously by the server. This makes each write quite slow form the client's perspective. A small optimization that you will add here is a COMMIT protocol message. COMMIT allows each WRITE to be ack'd asynchronously. The way this works is that each WRITE is sent by a client and immediately ack'd by the server; only when COMMIT is issued by the client do all the previous WRITEs get committed to disk, thus enabling them to be batched into a single larger I/O. Once the COMMIT is ack'd, the client knows it can release the copies it is keeping in case of retransmission. You can read more about COMMIT online in the NFS v3 spec.

Server Crash Recovery

Part of the difficulty of any such endeavor is the handling of crashes. Most important is crash recovery of the server; make sure that when it crashes and reboots, the clients do not notice except in terms of performance. What does the server need to do to handle crashes properly? What are points of danger during its operation? Make sure your demo also goes over server crashes and shows that your system works in spite of them (assuming a prompt recovery).

In general, you can assume that machines crash and are rebooted promptly. This is sometimes called fail-recover behavior.

Measurements

In addition to a demo (which should show that the basic file system works in the way that is expected, even with multiple clients accessing the same file), you should perform some measurements of your system to understand how it works and showcase its performance. What aspects of the system should you measure? Can you showcase its performance during normal operation, during server failure, the benefit of COMMITs, etc.?

Machines To Use

For this project, you might want to use Google Cloud or AWS or Microsoft or some other cloud offering. One reason to do so: it is good to know how to use modern cloud services, and usually, you can get some initial account for free.

Handing It In

To turn this project in, you'll just meet with me and TA Kan and run a demo of what you have done. You'll also bring a few graphs which help show behavior of your system, and a short (1-2 page) writeup of your system. We will give a little more detail in class.

You will also place your code into one partner's handin directory. From the other partners' handin directories, create soft links to this directory.