A "mod" is a modification of something, usually used in the context of a video game or hot-rod car. Here, we instead explore "mods" to distributed systems we have studied. Read the questions and explore these mods. And remember to have fun! Or, at least, remember to finish all eight questions. Answers should not be "too long". This means you should not write more than a page for each answer; often less is fine. The rules for the exam: - You *may* consult the papers and other information sources as needed. - You *may not* talk to anyone else about the exam *except* me (Remzi). - If you have a pressing question, *send email* to me; do *not* post to Piazza. To turn it in, send me your answers via emai in either plain text (e.g., an email) or a PDF of some kind (scanned, or produced from some program). Please no word documents or other formats - thanks! And good luck! ------------------------------------------------------------------------------ 1. Paxos is often used at the heart of a replicated state machine to build a fault-tolerant distributed system. But Paxos can be overly constricting. For example, imagine we are using Paxos to replicate a storage system of some kind. Paxos generates a strict order of the commands issued to each state machine, thus ensuring replicas remain in sync. However, when commands arrive (say, to an elected leader), they generally could be reordered for efficiency; thus, some buffering and reshuffling of commands might be useful for Paxos-based systems. Describe some situations where command reordering would be useful in the context of a Paxos-driven replicated key-value storage system. Estimate how much performance benefit you can get from this modification. What changes would be required of the protocol, if any? 2. Raft uses a strong leadership model. However, sometimes the leader may not be performing well. Thus, in this question, we seek to change the Raft protocol or aspects of the implementation (including client library) to remedy this situation. How would you change Raft to deal with underperforming leaders? Assume that in some cases, the leader is mostly cooperative, and in others, it may not be, in designing your solution. 3. Weaker models of consistency can be quite useful, particularly in wide-area settings. In these systems, problems occur due to dependencies between operations. Say, for example, a user uploads a document to a system, and then adds a link to the new document into an existing document. The new link (reference) "depends" on the document getting uploaded first; if the other order occurs, a user may click on the link and not find the document. Assuming Dynamo-style consistency, describe (in detail) how dependency problems might arise assuming users try to store documents with links (references) in them inside Dynamo. Then, discuss how you might "fix" this problem -- in robust ways, or perhaps with hacks -- so that users never see a "broken" link. 4. The designers of LBFS have found that it is too slow; the expensive Rabin fingerprinting and SHA-1 hashing schemes have both added excessive overheads. To fix these problems, the designers changed two things. First, they changed from variable-size Rabin fingerprinting to fixed-size blocks of size 4KB. Second, they changed the SHA-1 based hash to a new function that returns a hash over a 4KB block as follows: unsigned int hash(char buffer[], int size) { unsigned int hash = 0; for (int i = 0; i < size; i+=2) { hash += (unsigned int) buffer[i]; } return hash; } What problems might these changes induce in LBFS? Assuming they increase performance, are these good changes? 5. Load balancing is a hard problem, due to many different goals. One simple goal is to ensure all machines upon which load is placed are highly utilized. Another goal is to ensure they are efficiently utilized, for example by being aware of the effects of caching. The LARD system we read about tries to balance these needs. However, LARD assumes a centralized load balancer which can adjust load as needed to meet utilization and locality needs. In this question, you will modify LARD to work with multiple load balancers. It should generally be assumed that the load balancers do not communicate directly with each other; however, when they place load onto a backend server, the response from that server can share extra information if needed. How would your modifications to LARD ensure that it works well despite balancing load from multiple front ends? 6. GFS does not deliver purely "strong" consistency, but rather a more nuanced form of consistency, with differences based on whether writes or appends are performed. Assume you change GFS only to use appends. Now, however, you wish to ensure that appends are strongly consistent. How would you change the GFS protocols so as to ensure such strong consistency? What implications would your changes have on the behavior of GFS? 7. BigTable has a number of interesting optimizations to improve performance or optimize space or otherwise improve the system. One is the locality group. Describe the ways in which locality groups improve BigTable. Estimate, to the extent that you can, how much of a difference such locality groups make. What information do you need to make a good estimation of their benefits? If our "mod" was to remove locality groups (say, for simplicity), how much would be lost? 8. Petal uses simple techniques to distribute data among disks. Consistent hashing uses more sophisticated techniques, to provide better properties when nodes are added or deleted. Describe how to modify Petal to use consistent hashing. Which structures would have to change? Estimate how much benefit such an approach would provide as compared to Petal's current behavior.